Accelerating digital block design with Google’s open source Mid-Level Synthesis XLS toolchain
Topics: Open source tools, Open ASICs, Open FPGA
Digital circuits are becoming more and more complicated due to constant technology development and increasing user expectations. In response to this growing complexity, development tools that provide a higher level of abstraction for digital system designers have been rising in prominence.
Alongside HDL generators such as Chisel, SpinalHDL, Migen and Amaranth, which have been gaining some popularity among more software-minded developers, there are even newer approaches promising significant productivity increases like Google’s open source XLS toolchain. Using an encode/decode accelerator that Antmicro has been developing and contributing to XLS as an example, this article will describe how XLS can be used to build, adapt and test digital designs. Of course XLS’ capabilities go beyond codec blocks, but given the framework’s origin in helping Google deliver efficient transcoding solutions, they serve as a good illustration of the strength of this approach.
XLS as a Mid-Level Synthesis toolchain
XLS (standing for Accelerated Hardware Synthesis) is a fully open source toolchain created by Google that produces synthesizable designs from high-level descriptions of their functionality. Known limitations of standard High-Level Synthesis (HLS) approaches to non-trivial problems are well recognized, so XLS tries to strike a good balance between succinctness and flexibility, to the point of referring to itself as a “Mid-Level Synthesis” tool. This emphasizes that despite the provided level of abstraction, the user may configure low-level details of the flow to create designs that are both easy to reason about and efficient. XLS gives designers control over many properties in the circuit that traditional HLS solutions would abstract away without explaining what the tool did and why. XLS instead chooses to make decisions more explicit (being able to specify things like pipe stages explicitly, acceptable worst case throughput, explicitly determining what SRAMs are introduced / along which channels, surfacing detailed and transparent scheduling reports, etc.). The use of procs (single stateful XLS elements) lets you find a midpoint between always blocks you’d create in RTL vs. loops which you can’t be sure what the HLS tools will translate into, so you can add structure with procs as structured but multi-cycle concurrent elements as a kind of “always blocks on steroids”.
The main objective of the XLS project is to enable productive collaboration between software and hardware engineers by creating a common methodology for designing digital circuits, based on a software-driven approach. As a result, the same design description can be used to generate a software model of the circuit and a final RTL description in Verilog or SystemVerilog. This kind of a common denominator approach allows both groups of engineers to cross their domain boundaries, understand each other’s cost models, and share knowledge and experience. XLS is thus meant to help level up the hardware development process with the velocity, composability, modularity and retargetability known from the software world.
XLS use cases
The benefits of XLS are particularly evident in the design of digital circuits with significant algorithmic complexity, since the provided layer of abstraction allows the user to focus more on the designed functionality, rather than low-level implementation details. Because of this, XLS is a great fit for designing circuits based on video encoding, image processing, encryption algorithms, compression algorithms, or accelerating computation for AI processing.
However, XLS is not intended solely for complex designs. The examples provided in the XLS repository demonstrate the toolchain’s versatility and show its many possible applications.
XLS in action
In collaboration with Google, Antmicro has been working to demonstrate how XLS can be used for implementing compression algorithms such as Run-Length Encoding (RLE) and Dictionary Based Encoding (DBE) as open source ASIC-targeted blocks. The encoders and decoders were implemented in DSLX. The level of abstraction enabled by DSLX allowed for exploring different architectural choices, and incremental refinement in the implementation of the encoders. This resulting contribution is more general and parameterizable than traditional HDLs would allow without introducing significant complexity.
Let’s look at the RLE block as an example. The initial version of the encoder, which was very simple to implement, used a proc to sequentially read incoming data and compress it into symbol-value pairs.
To enable more practical encoder designs, we then proceeded to add a more advanced version of the block capable of processing multiple symbols simultaneously. A subsequent reimplementation includes four processes that communicate with each other, as detailed in the image below.
The first block - similar to the initial implementation - is responsible for taking the input and reducing it to pairs of symbols and the number of its occurrences. The second element of the encoder shifts the previously emitted pairs and adjusts them for further processing. Both of these elements have an empty state. The next block takes the prepared data and combines it with the information about previously processed symbols. The last element is responsible for adjusting the width of the output data to the receiver interface.
Overall, we can break down the data processing into four stages: reduction, alignment, compression, and output generation. The division of responsibility allowed the specialized blocks to efficiently process data and gave us a chance to thoroughly test each functionality separately.
Testing and verifying the design
Together with the encoders, we created multiple tests to verify that the designs work correctly. Additionally, the internal verification mechanism built into XLS ensured that the generated RTL sources correspond with the software-model that we tested thoroughly before. Later, to investigate the throughput of the core, we added support for the popular Python-based Cocotb framework into the XLS toolset, which allowed us to create reusable Python tests examining real-life performance of the designs converted to RTL code.
Closer to silicon with SKY130 and ASAP7
Since XLS is meant to enable production-grade block design for ASICs, another element of the project was to integrate open source physical design tooling into the XLS toolchain. This allowed for generating silicon layouts using the SkyWater 130nm and ASAP7 PDKs and closing the design loop, from software written in DSLX down to GDS. Having a fully open source flow is great for keeping the framework well-tested with readily available performance parameters that can be tracked over time.
The physical design flow uses Yosys for synthesis and OpenROAD for floorplanning, placement, clock tree synthesis and design routing. Every step in the process leverages standard cell definitions and design rules from the chosen PDK. The entire workflow is constructed as a collection of reusable rules for the Bazel build system which is used in the entire XLS project.
Below you can see the Run Length Encoder silicon design in Klayout and its visualization created with gds_viewer:
Renode XLS integration
Since fabrication of high-end chips is expensive and time-consuming, and ultimately the final performance and usability of silicon is dependent on the software that runs on top of it, the ability to test the system in a practical HW/SW context from the very beginning is invaluable. Therefore, in parallel to the design itself, we decided to create a fully functional demonstrator showcasing the usage of the created encoders for real-life applications on a RISC-V platform. Co-simulation of digital designs using Verilator has been available in Antmicro’s Renode simulator for some time, and was an inspiration for creating a similar integration for XLS. One of the features that convinced us to take this effort is the JIT compilation available in XLS that allows executing design models at native machine speed. The Renode-XLS integration is in progress and will be described in a future blog note.
Accelerating digital design with Antmicro and XLS
XLS is a framework focused on developer productivity and thanks to its flexibility and extensive verification capabilities it allows for rapid development of digital designs. The new RLE and DBE building blocks, together with contributions to the framework itself from Antmicro improve the XLS ecosystem for practical usage beyond its original authors. If you are interested in developing and testing digital designs targeting FPGAs or ASICs using SW-driven methodologies, contact Antmicro at email@example.com.