Advanced co-simulation with Renode and Verilator: Zynq and FastVDMA
Published:
Topics: Open source tools, Open FPGA
Co-simulation is extremely useful for developing complex systems, especially those targeting FPGA SoCs, where specialized IP cores often interact with advanced software running on the hard CPU. Co-simulation has been available in Renode for quite a while now and we are constantly adding more and more features as well as ready-to-use peripherals. This is necessary for the work we are doing with our customers related to silicon prototyping, advanced video device development
In this blog note we will be showcasing Renode co-simulating the very popular Zynq-7000 SoC running Linux with a Verilated open source FastVDMA core. The core itself is controlled with a Linux driver implementing standard DMA API to showcase the intercommunication between the two sides. It is important to mention that HDL simulation is particularly slow, so having Renode simulating the greater part of the system is a tremendous efficiency improvement.
Advanced co-simulation for Xilinx Zynq
As you may have read in the previous note on advanced co-simulation, you can now have a Verilated peripheral initialize communication with Renode by sending and requesting data from the system bus. This in turn allows you to simulate complex setups in FPGA and FPGA SoC systems (such as PolarFire SoC and now also Zynq-7000) with multiple co-simulated IP blocks which you can see in the RISC-V + Verilated FastVDMA test.
Since the Xilinx Zynq-7000 (featuring an up-to dual-core Cortex-A9 and Xilinx’s 7-series FPGA fabric) is in use in many of our customer projects as well as Antmicro’s open source hardware platforms like the Zynq Video Board, we decided to demonstrate how Renode’s co-simulation features can be used there using Linux and our open source Fast Versatile DMA core.
Related to this work, Renode’s Zynq-7000 simulation model was also updated, adding quite a few interesting features, such as:
- Update of the bundled demo to a more recent kernel version (and corresponding model improvements)
- XADC peripheral model
- Cadence GEM (Zynq’s Ethernet controller) peripheral improvements
This gives you an easy starting point for using co-simulation with the platform running a modern Linux kernel.
FastVDMA Linux driver
To quickly recap, FastVDMA is an open source DMA controller designed with portability and customizability in mind written in Chisel and released by Antmicro in 2019. It can be controlled by either AXI4-Lite or Wishbone buses, as well as perform transfers using AXI4, AXI-Stream or Wishbone interfaces, providing high configurability and good performance. You can grab the code directly from the FastVDMA’s Github repository to try it out yourself.
So far in all of our projects involving FastVDMA, we controlled it manually by writing to its registers, but this approach was not very portable. It also meant that any software working with FastVDMA would have to consider the exact register layout of this particular DMA instantiation making it not portable for any other DMA controller or even different version of VDMA.
This is why we’ve introduced a FastVDMA Linux driver that implements the DMAEngine API which provides a convenient abstract layer between the FastVDMA controller and any possible DMA client.
The driver offers preparing and submitting transfers for two different DMA configurations:
- SMDMA, which regards transfers between the stream and the memory mapped region
- MMDMA, which refers to transfers between memory mapped regions
There are three possible channel types based on the transfer direction:
- Memory to memory
- Stream to memory
- Memory to stream
How the FastVDMA Linux driver works
The FastVDMA Linux driver is essentially a layer between the FastVDMA controller and the DMAEngine API. It needs to be loaded into the kernel with the correct device tree entry. You can request channels, prepare transaction descriptors as well as submit transfers via appropriate functions described in detail in the DMAEngine API Guide.
The FastVDMA controller contains four registers for each Reader and Writer. Those decode information about the address of the transferred chunk, its line length (expressed as a multiple of data bus width), line count and stride (also a multiple of data bus width).
Those are submitted when device_prep_slave_sg
or device_prep_dma_memcpy
are called. Arguments in those functions contain the address to the scatterlist or buffer to be transferred as well as their size.
Apart from that, there are control registers that are written to whenever correct transfer details have been uploaded. This action is tantamount to physically starting the transfer.
The status of the transfer can be observed through mask registers which inform whether Writer or Reader are still busy handling the data.
When the transfer is finished, the controller sends an interrupt to the driver, which is then handled by clearing status registers and calling completion. This callback is then forwarded to the DMA client requesting that particular transfer.
You can look into the code in Antmicro’s fork of linux-xlnx repository .
FastVDMA Demo and more
If you want to get a quick feeling for how all of this works in practice you can just use a Renode script which runs FastVDMA in co-simulation with Zynq in Renode and then boots Linux. From there on it is possible to load an example of a DMA client as well as a user space app that requests image processing from the DMA client.
The script itself allows you to easily test software using FastVDMA on Zynq.
Antmicro helps customers embrace faster turnaround HW/SW co-development flows for complex products as well as open source IP cores and hardware to enable vendor-neutral and completely transparent FPGA solutions. Follow our FPGA work by tracking our activity in CHIPS Alliance such as the recently released FPGA Interchange Format developed in collaboration with Xilinx, or the open source DC-SCM, developed with Google and IBM within OpenPower, as well as the constant improvements in Renode’s co-simulation and configurable FPGA-targeted SoC modelling capabilities.