Building low-latency smart video solutions with open source tools and Lattice CrossLink-NX
Topics: Open FPGA, Open source tools
At Antmicro we are often faced with the challenge of designing video processing devices with a low power and physical footprint. This usually implies using an FPGA for implementing repeating pipelined operations necessary for processing millions of pixels with close-to-zero latency.
The relatively inexpensive Lattice CrossLink-NX FPGAs are the ideal choice for such applications, as they offer plenty of logic resources (17K or 40K LUT depending on the variant) to implement dedicated IP cores. Moreover, they are equipped with various types of hard-blocks like RAM and DSP useful in video processing. Finally they provide industry-standard video interfaces such as MIPI CSI/DSI widely used for interfacing with cameras and displays.
Most importantly perhaps the CrossLink-NX has an open source FPGA toolchain that we are actively developing as part of the FOSS flows for FPGA effort within CHIPS Alliance. As an open source oriented company we rely on being able to improve our workflow and toolchain and identifying issues down to the last bit. This enables us to provide complex end-to-end solutions for video processing with confidence stemming from extensive continuous integration and testing that open source toolchains allow.
Open source IP
In the past devices involving FPGAs were often also equipped with a separate MCU chip which performed control tasks. Modern FPGAs are large enough to fit a softcore CPU inside the fabric, enabling them to be used as full-fledged SoCs.
Given that most of our projects are software-oriented, at Antmicro we naturally lean towards the SoC approach in our projects but as an open source driven company we make use of as much open source IP cores (a.k.a. gateware) as possible. Luckily for us there is a wide array of cross-platform IP cores that we have been building, adapting and integrating for our designs.
The open source IP ecosystem evolves around the RISC-V open ISA governed by the RISC-V International which Antmicro is a founding member of. Depending on the application requirements we can choose between RTOS or Linux-grade cores (and scalable/configurable ones like VexRiscv), and we contribute to both Zephyr RTOS and Linux support for RISC-V based FPGA platforms. For video processing applications enabled by CrossLink-NX, an RTOS-level core is more than sufficient, as most of the computation to be done will be performed by other specialized IP blocks (though a Linux-capable core is absolutely possible).
To build a practical video system, you will most likely also need an efficient open source SPI core which enables efficient communication over the Quad SPI interface with external Flash memory chips, as well as well as an the open source DDR controller to interface with DRAM for processing the video image data. Based on our customer’s real life scenarios we have been implementing, integrating and improving those cores to provide a seamless experience.
Once you have enough memory, you will probably want to perform some processing - this is, after all, a video processing system.
Processing modules and accelerators
For specific applications that we work with - be it robotics, medical, security, industrial automation etc, the major part of the computations is always performed by dedicated blocks in the gateware tailored to the use case. Those take the form of either stand-alone pipelined modules or accelerator modules connected to and controlled by a softcore CPU. They take advantage of dedicated hard-blocks in the FPGA to perform their task faster than using generic logic resources. Modern devices like Lattice CrossLink-NX provide very flexible DSP hard-blocks which allow implementing operations like multiplication, multiply-accumulate and even simple FIR filters.
Many video processing systems that we are building today are using machine learning (ML) and artificial intelligence (AI) algorithms. Various open source frameworks (Apache TVM, TensorFlow Lite, etc.) for implementing AI models provide ways for utilizing dedicated accelerators either directly or indirectly. Our recent involvement in development of multiple AI codebases led us to create a new unified framework - Kenning, which simplifies training and deployment of ML models regardless of the underlying solution used.
Historically in FPGA-based SoCs an accelerator module would be directly connected to the system bus allowing it to access the system memory and making it visible in the CPU address space. This complex approach inherently leads to significant FPGA resource utilization and increased latency in communication with the CPU.
An alternative approach is to integrate the accelerator with the CPU softcore itself which removes most of those problems. Until recently the variation in softcore implementations and the lack of a standardized interface made such solutions very unique hence not reusable. Thankfully that changed with the introduction of RISC-V Custom Function Units or CFUs that Antmicro helped standardize as part of a dedicated RISC-V International subgroup. CFUs are simple accelerators tightly coupled with the CPU core over a standard interface that can easily be added to FPGA-based RISC-V SoCs. The CFU Playground framework by Google, which we have been collaborating on related to our FPGA and simulation work with Google, provides an excellent framework for experimenting with and implementing new CFUs as needed by the intended device application.
Renode, the open source versatile system simulation framework developed by Antmicro natively provides support for CFUs via co-simulation using Verilator, with most of the SoC simulated directly. This removes the need for simulation of the whole SoC at the gate level.
Open source FPGA toolchain
Historically FPGA toolchains were closed-source and proprietary which didn’t allow for transparent design implementation flow and put end-users at the mercy of vendors when it came to fixing bugs.
It took years for open source tools for FPGA to materialize, but we always believed it was inevitable - just like with open source software toolchains like GCC and LLVM, you need to be able to analyze your performance, fix bugs, deploy open source CI systems etc. In collaboration with Google and others, Antmicro is maintaining the FOSS FPGA toolchain project targeting multiple FPGA architectures and allowing seamless switching between them without the need for altering the input design each time. The toolchain integrates other open tools like Yosys - a logic synthesis engine and VTR and nextpnr used for placement and routing.
Our involvement in open source FPGA tooling reaches beyond just new platform support. We constantly aim at improving their quality and robustness as well as adding new features - such as SystemVerilog synthesis support via our UHDM plugin, also developed within the CHIPS Alliance.
FPGA Interchange format
Most of the existing proprietary FPGA toolchains allow only synthesis to be performed externally, the rest of the flow remains closed with a netlist at the input and the final bitstream at the output. For example, you cannot perform placement and routing with two separate tools.
Fortunately this is now changing with the introduction of the FPGA interchange format targeting interchangeability between different open and closed tools in the FPGA space. The FPGA interchange format defines common data representation for a design netlist and FPGA resource description.
Currently the most important architectures where we implemented support for the FPGA interchange format are Xilinx 7-series and Lattice CrossLink-NX with nextpnr, but we are working on providing native support for it in VTR as well, which will be available soon.
Open source hardware
Many of our software and IP projects actually begin with a request for a custom hardware design. Fortunately we provide end-to-end solutions which do include hardware designs starting from simple PCBs and ending on complete devices. Whenever possible we tend to release our own boards under an open license and provide all the design files for the benefit of the vast open source community, and those often become a starting point for commercial engagements.
Our wide array of open designs includes many FPGA platforms, ranging from Xilinx Zynq and UltraScale+, Kintex7, through QuickLogic’s QuickFeather to Lattice’s ECP5-based DC-SCM board and CrossLink-based MIPI-SDI bridge. With a bunch of CrossLink-NX projects in the works, we’re looking forward to the time when we’ll be able to release a CrossLink-NX based design as well.
An ideal solution for video processing applications
Ultimately Lattice CrossLink-NX FPGAs are a solid choice for video processing applications. The existence of a fully open source toolchain is a key factor here as there are not that many FPGA architectures on the market for which such toolchains are available and robust enough to sustain build real products in the field. Thankfully our and our partners’ continued work in this area under the umbrella of the CHIPS Alliance FOSS flows for FPGA project means that this number is constantly growing.
The development of open source toolchains (not only for FPGA but also ASIC, ML etc.) is a vital part of our work at Antmicro. We strongly believe that open source implementation flows are the future of FPGA. Not only do they provide the transparency necessary for security-oriented applications, but are also open for third party contributions - and initiatives such as CHIPS Alliance, in which Antmicro was recently joined by FPGA pioneer Xilinx, are making this happen. Don’t hesitate to reach us at firstname.lastname@example.org and let us develop your next-gen project on the Lattice CrossLink-NX or similar FPGA.