Antmicro's Fast Versatile DMA: an open, portable & vendor-neutral DMA controller
Topics: Open FPGA IP, Open toolchains
FastVDMA is an open source DMA (Direct Memory Access) controller developed at Antmicro.
One of the main motivations leading to the design of an open source DMA controller was the lack of portable open source alternatives to proprietary controllers provided by FPGA vendors. This situation leads to a reduction in the reusability of DMA-based designs into different contexts when adopting multiple kinds of platforms, since DMA solutions tend to be tightly integrated with vendor-specific toolchains and IP. As a result, a non-negligible part of the work required in creating designs that implement proprietary DMA controllers, ends being highly platform-dependent and less useful to developers using other platforms.
At Antmicro, we strongly advocate cross-platform and reproductible solutions to our customers, and are often the first to identify both immediate and long-term vendor lock-in constraints. The integration of FastVDMA with portable SoCs such as LiteX, would solve the portability and platform-dependence of any DMA-based designs, and so allow for more engineering freedom in our FPGA projects. With the rise of open source ISAs (like RISC-V and POWER) and the proliferation of openly available FPGA softcores implementing them (just take a look at the RISC-V cores list Antmicro is helping maintain), vendor-neutrality and portability - in all aspects - plays an ever increasing role.
Versatile support for standard features
The purpose of a DMA controller is to arbitrate and handle the transfer of blocks of data directly from an I/O device to the main memory of a system, with minimal intervention of the CPU.
The main feature of FastVDMA is the ability to support multiple types of buses, namely AXI4, AXI-Stream and Wishbone. Each one of these buses can be used as write or read frontends. These are modules responsible for the conversion of internal data and control interfaces to the external one (AXI4, AXI-Stream, etc.).
Therefore, standard use cases such as Memory to Stream, Stream to Memory or Memory to Memory transfers are easily supported. To provide additional details, internally located control and data signals are handled in a generic way and only get converted to the desired interfaces in the frontends. This results in handling internal DMA signals agnostically with respect to the chosen interface.
Support for video oriented use-cases has been kept as a reference to drive the DMA controller design, which resulted in supporting features that are required in such applications. Typical requirements consist in having transfers described as number of rows, the length of a row and amount of bytes to skip between consecutive rows. This allows to perform transfers affecting only the selected area of the frame. On top of that, the controller also supports using external synchronization signals to start transfers.
Design and resource utilization
FastVDMA was designed using the Chisel HDL. Thanks to the high flexibility of Chisel, the DMA controller design process proved to be significantly easier compared to the use of standard HDLs such as Verilog or VHDL, and the resulting code is easier to maintain and parametrize.
Apart from the need for abstraction stemming directly for the need to support many platforms, resource utilization was also a key factor driving the design of FastVDMA, as it directly impacts portability to resource-limited platforms.
Indeed, the current implementation utilizes 455 slices on a Zynq 7030 FPGA. The configuration chosen for this implementation consists of an AXI4-Lite configuration port, an AXI-Stream slave input and an AXI4 interface port responsible to write data back to memory. Each bus is configured with a 32-bit data bus and a 32-bit words FIFO, for a total of 512 words. The FIFO is implemented on the device’s logic, used as memory, and placed between AXI-Stream and AXI4, serving as a buffer.
Results and conclusions
The implementation of the FastVDMA controller as described above was verified on hardware achieving an average throughput of 750MB/s, while being clocked at 250MHz, and reached 330MB/s at 100MHz under the same workload. Both these tests were performed in a Memory-Stream-Memory configuration using two controllers configured with AXI4 and AXI-Stream buses. The first controller reads data from memory and sends it out via an AXI-Stream interface, while the second receives the stream and writes the data received to a second buffer in memory.
In both cases the data transferred consisted of a 4MB block of randomly produced data which was subsequently verified for possible transmission errors after each transfer.
Detailed results presenting achieved average throughput are shown below. In those throughput tests, each controller configuration was tested separately. In every case AXI-Stream interface was attached to an ideal data source or ideal data sink. Each test case is represented as a pair NxM where N represents number of 32-bit words and M stands for number of N word rows to transfer.
FastVDMA is a fully open-source project of an easily-adaptable DMA design, ready to be implemented on a wide range of platforms, given its high level of configurability as well as limited resource usage.
As for future development, one of the most relevant efforts to be performed is to enable support for separate clock domains for each interface. In addition, a higher flexibility degree to allow different read and write bus widths, is a useful feature that will need to be added into the design at a later stage.
Further practical information on FastVDMA is provided in Antmicro’s GitHub repository.
If you’d like to build future-oriented, vendor-neutral FPGA systems, integrating many different software and hardware elements which you have full control of, make sure to reach out to us at firstname.lastname@example.org.