Open source hardware accelerator subsystem for FPGA/ASICs
Topics: Open FPGA, Open source tools
Antmicro’s involvement in the VEDLIoT project, an EU-funded initiative aiming at providing a common platform for efficient Deep Learning in IoT has allowed us to develop several open AI/ML ecosystem tools and components which we have since been using for our customers in projects spanning many market verticals. Developments already described and in production use include the Kenning open source ML framework together with its runtime capabilities as well as supporting Custom Function Units in Renode for pre- and post-silicon product development, among other things.
However, an equally interesting part of our efforts within the VEDLIoT project involves the development of an open source, production-ready SoC design that would enable interfacing various hardware accelerators with more complex systems. Being able to easily add an accelerator to increase your SoC’s performance for specific tasks is particularly beneficial in IoT devices, which are often required to process data in real time with a limited power budget.
A generic accelerator interface
One of the key requirements of the SoC system developed within the VEDLIoT project is to enable the use of efficient and low power accelerators. This required us to design an interface between the accelerator logic and the rest of the system, with. key requirements such as:
- Configurability - to ensure data movers are able to handle various data interfaces,
- Vendor independence - to make it synthesizable for any FPGA chip and possible to include in an ASIC design,
- Open source - to create a complete, production-ready subsystem entirely of open source building blocks.
Within the VEDLIoT project we are exploring two approaches of interfacing with accelerators:
- Tight integration with a CPU where the accelerator is implemented as a Custom Function Unit and is driven with custom CPU instructions,
- Accelerator connected to a system bus consuming data from the memory and writing results back to it.
In this note we’ll focus on the latter - accelerators available on the bus. This approach requires extending the system with a generic interface that would allow instantiating any data processing block.
By necessity, such an interface would have to consist of two data movers – one responsible for passing the data onto the input of the accelerator, and the other for receiving the processed data and transferring it back to RAM.
For that purpose we used Antmicro’s open source DMA controller – FastVDMA, described in one of our earlier blog posts. FastVDMA is vastly customizable, allowing modification of various bus parameters such as address, data width or maximum burst. It can easily be generated in different configurations of supported data and control buses, providing compatibility with the following interfaces:
- AXI Stream
- AXI4 Lite
Recent improvements to the FastVDMA core introduced a new level of configurability, where the required interface can be generated on-the-fly. Also a single instance of the core uses unique block names to prevent naming collisions in bigger systems and enable generating more complex systems with multiple FastVDMA cores of various configurations.
Apart from the movers, the interface also implements general purpose registers that are used to control the processing core.
It is important to mention that the code of both the data movers and the CSRs is vendor agnostic and can therefore be easily used with any FPGA or included in ASIC design.
Open source FPGA ISP
In order to demonstrate the above-mentioned interface we have implemented a simple ISP system with an open source debayerization core, the latter of which has been described in more detail in a previous blog post.
The reason why we chose this particular processing system is that video processing is great for testing accelerator interfaces. The visual output not only facilitates debugging, but also requires processing a significant amount of data in real time, allowing any issues to be discovered in a fairly short time.
Antmicro’s open source FPGA ISP core is written in Migen. It can be used with the LiteX SoC builder to generate a bitstream for various FPGA boards such as Antmicro’s open source Zynq Video Board. The platform definition and ZVB specific target have been integrated in antmicro_zynq_video_board.py. It is also possible to generate a standalone core that can be used in co-simulation in Renode, Antmicro’s open source simulation and development framework.
We have now also implemented a driver to manage the open source FPGA ISP in Linux that supports V4L2 API. It is compatible with both
gstreamer. There is also support for
V4l2 controls which allows setting the demosaicing algorithm and pattern.
The FPGA ISP has been tested using both hardware – Antmicro’s open Zynq Video Board as well as co-simulation in Renode. Both tests consist of passing an image to the FPGA ISP and running the
v4l2-ctl tool with the stream option using the FPGA ISP video device. The image passes through the system and the output is provided in
out.rgb which then can be converted into a desirable format.
Vendor-neutral vision processing in FPGAs
Apart from the open FPGA ISP being a demonstrator of the accelerator interface, it is also quite useful as is. Video is a common input in many industrial applications, autonomous vehicles and smart surveillance systems to name a few - and for customers with use cases in such areas we will often be developing image recognition or stereovision systems. Since a major part of the technological stack behind the video input and processing is proprietary, the work Antmicro has been conducting - also within the current VEDLIoT project - provides a transparent, controllable open source alternative. And since the open FPGA/ASIC accelerator subsystem is vendor agnostic, it can be easily adapted to work with a platform of your choice.
The open source FPGA ISP is one of many FPGA projects Antmicro is involved in. FPGAs not only give us an opportunity to develop vital ecosystem technologies such as open SoCs running open source software. FPGA solutions can provide low latency, high bandwidth and energy efficiency. This, combined with deep control over the hardware and software, offers an ideal environment for applications oriented at real-time data processing.
End-to-end open source FPGA/ASIC design solutions
Over the years, Antmicro has worked on many individual parts of the open source ecosystem described in this blog post - from building open source hardware like the Zynq Video Board, leading the development of FPGA toolchains hosted under the umbrella of CHIPS Alliance, through supporting camera manufacturers and users with configurable and standardized Linux camera drivers (e.g. Allied Vision Alvium) to userspace apps to tie it all together.
It does not however end there, as we will continue working on the accelerator subsystem to make it more configurable and easily applicable in non-trivial industrial setups, including projects requiring efficient AI processing. If you’re interested in using this approach in your next FPGA or smart vision project, make sure to reach out to us at email@example.com. We offer comprehensive services in building edge AI products, encompassing both open hardware platforms and software as well as FPGA and ASIC design workflows, for a complete, open source-driven end-to-end experience.