Improving Verilator's hierarchical mode for better performance and scalability

Published:

Topics: Open source tools, Open ASICs, Open FPGA

Antmicro has been providing engineering support for Verilator in a variety of ASIC-related projects, which often include complex, state-of-the-art designs and take a lot of time time and memory to run. Normally, when generating C++ code (a.k.a. verilating) from the original HDL as a whole, all modules are coupled with each other, comprising a single verilation unit. Introducing any changes triggers verilation for the entire design, a long and memory-intensive process.

To mitigate this, using Verilator’s hierarchical mode users can divide a design into lower-level hierarchy blocks, with each module acting as a separate verilation unit taking less time and memory to verilate.

Following up on our ongoing work on optimizing and scaling Verilator for very large designs, in this article we summarize the recent improvements we introduced to Verilator’s hierarchical mode, including better scheduling, new parameter types, multi-thread hierarchical simulation, and more, to enable faster-turnaround ASIC design flows.

Hierarchical verilation improvements

Improving simulation scheduling in hierarchical verilation

Thanks to hierarchical verilation builds being incremental, designs that were previously too large to be scheduled on multiple cores now qualify, and Antmicro’s work has been focused on making this mode as performant as possible.

We introduced modifications to Verilator’s internal scheduling system for greater multi-threaded scalability and better overall simulation performance. This includes concurrent simulation of hierarchical blocks, in addition to the execution cost of hierarchical blocks now being evaluated and used during scheduling for more optimal task layout in Verilator.

During hierarchical verilation, modules with hier_block markings are verilated independently and then combined in the hierarchical consolidation phase. We observed that by modifying the simulation thread pool, we can schedule such blocks on multiple threads as well. For the sake of flexibility and convenience, it is up to the developer to specify the desired thread count assigned to each hierarchical block. The number of threads can be specified for each hier_block with the VLT option:

hier_workers -module "<module_name>" -workers "<worker_count>"

Fully multi-threaded hierarchical simulation is now possible in both micro (hier_block level) and macro scale (the whole consolidated design), paving the way for further simulation scheduling optimizations.

We also improved Profile-Guided Optimization (PGO) profiling for multi-threaded hierarchical scenarios. Verilator uses cost-based static scheduling for simulation, taking costs into account to optimize task sequencing. The PGO in Verilator makes it possible to tailor a simulation binary to the specifics of the design by using measured task costs for the final scheduling, and we enabled collecting cost data for hierarchical modules as well.

The PGO can be enabled during verilation when the following command line option is provided:

--prof-pgo

Relaxing hier_block requirements and enhancing usability

Verilator’s documentation provides a list of known limitations that modules need to obey to be marked as a hier_block. To make hierarchical verilation more widely applicable, Antmicro also worked on relaxing some of those restrictions.

Basing of the previous work on parametrized hierarchical blocks, we introduced support for type parameters. This way, modules with a parameter type SystemVerilog construct can be now marked as hier_block. Such parameters are safely forwarded to and from hierarchical children via convenient human-readable configuration files. We also widened the domain of allowed port types in hierarchical modules by introducing support for integer atom types and signed primitives.

To enhance Verilator’s hierarchical mode and its usability in larger, more complex projects, we introduced several fixes and improvements to Verilator’s built-in tooling, such as verilator_gantt, simplifying development and simulation performance tuning. These include:

To test the improved hierarchical mode, we employed our Benchalot benchmarking framework. The benchmark consisted of 3 successful verilation attempts of a large SystemVerilog design scheduled on 2 threads consisting of 3 hierarchical blocks, with the most expensive one being scheduled with 2 hierarchical workers, using the latest Verilator. We measured execution time and memory usage, as shown in the tables below.

Hierarchical verilation time benchmarking results

Hierarchical verilation memory benchmarking results

The results show that hierarchical verilation significantly decreased both memory consumption and execution time.

Extending Verilator for complex designs

The efforts described in this article bring significant improvements to Verilator in terms of verilation and compilation times, resource usage and scalability for state of the art designs. Antmicro offers engineering support in customizing and extending Verilator for complex designs and niche use cases, complemented by a broad portfolio of open source tools including astsee, sv-bugpoint and Benchalot that help improve RTL design and test tooling to increase the productivity of your ASIC teams.

If you would like to adopt Verilator or customize it for your specific needs, get in touch at contact@antmicro.com, and visit our offering page to learn more about Antmicro’s engineering services.

See Also: