Automated and standardized software benchmarking with Benchalot
Published:
Topics: Open source tools
The growing complexity of the hardware Antmicro helps its customers build and deploy software workloads on requires continuous benchmarking and optimization to track, understand and fix performance bottlenecks. In our work with tools such as Verilator or OpenROAD we are on a never-ending quest for reducing memory usage, decreasing execution time, and, ultimately, improving the productivity of our customers’ silicon and software teams.
Benchmarking software often involves comparisons between an array of commits that vary in their input parameters, which is typically done manually or by writing dedicated scripts. To automate this time-consuming task and shorten optimization and debugging turnaround, we created Benchalot, a configurable, universal CLI tool for running and analyzing benchmarks. Benchalot lets developers specify a matrix of parameters they wish to iterate over, uses this data to create a set of benchmarks, then runs them and visualizes the results.
In the article below, we go into details about Benchalot’s features, describe how to configure and use the tool as well as what types of output formats it can use for aggregating, visualizing and analyzing results.
Customizable benchmarks with Bechalot
Benchalot allows the user to specify a matrix of parameters that are then used to automatically create multiple benchmarks. Each benchmark is then executed, producing results which can be aggregated and visualized.
Benchalot is configured using YAML files, such as the one shown below:
matrix:
version: [v1.0, v1.1, v1.2]
input: [data1, data2, data3]
prepare:
- build {{version}}
benchmark:
- run {{input}}
This configuration file will result in 9 benchmarks, and Benchalot will measure the execution time of each run
command:
build v1.0
run data1
build v1.0
run data2
build v1.0
run data3
build v1.1
run data1
build v1.1
run data2
build v1.1
run data3
build v1.2
run data1
build v1.2
run data2
build v1.2
run data3
Benchalot can also automatically create different output formats, such as Markdown and HTML tables, scatter plots, box plots, violin plots and bar-charts. You can enable this by adding a results
section to the configuration file, as in the example below showing how to create a bar chart:
results:
plot:
filename: "plot.png"
format: "bar"
x-axis: version
facet: input
Benchalot also provides more advanced features, such as:
- system setup - Benchalot can change system options to reduce variance between time measurements
- custom metrics - metrics can be defined by any command
- compound variables - matrix variables can contain nested fields, allowing finer control over creating benchmarks
- stages - benchmarks can be divided into stages, with measurements for each stage gathered separately
To learn more about Benchalot’s features, refer to the README.
Benchmarking Verilator
To illustrate Benchalot’s features, we created a demo showcasing how it can be used to benchmark Verilator, a popular open source RTL simulator which Antmicro is actively contributing to.
To reproduce the demo, first clone the Cores-VeeR-EL2 repository which we will use as a test subject:
git clone --recursive https://github.com/chipsalliance/Cores-VeeR-EL2.git
Next, clone the Verilator repository and build verilator
:
git clone --recursive https://github.com/verilator/verilator.git
cd verilator
autconf
./configure
make -j`nproc`
Then create the config.yml
file:
matrix:
table: ["-fno-table", "-ftable"]
const: ["-fno-const", "-fconst"]
inline: ["-fno-inline", "-finline"]
gate: ["-fno-gate", "-fgate"]
env:
BUILD_PATH: snapshots/default
RV_ROOT: $HOME/Cores-VeeR-EL2
VERILATOR: $HOME/verilator/bin/verilator
BUILD_DIR: build
cwd: $BUILD_DIR
setup:
- $RV_ROOT/configs/veer.config -target=default -iccm_enable=1
prepare:
- ccache -C
benchmark:
verilation:
- cat ../default_args | envsubst | xargs $VERILATOR {{table}} {{const}} {{inline}} {{gate}}
compilation:
- cp $RV_ROOT/testbench/test_tb_top.cpp obj_dir/
- make -e -C obj_dir -f Vtb_top.mk VM_PARALLEL_BUILDS=1
simulation:
- cp $RV_ROOT/testbench/hex/user_mode0/cmark_iccm.hex program.hex
- ./obj_dir/Vtb_top --test-halt
conclude:
- rm -r console.log exec.log obj_dir program.hex trace_port.csv
results:
table:
format: "md"
filename: "summary.md"
pivot: "{{stage}} [s]"
To start the benchmarks, run:
mkdir build
benchalot config.yml
Benchalot will test Verilators’s performance by measuring how much time it takes to perform the verilation
, compilation
and simulation
steps for VeeR-EL2
with different combinations of flags (table
, const
, inline
and gate
) which disable or enable different internal optimization stages. The results will be later aggregated in a Markdown table.
This run produced the following results:
Based on the results we can assess that optimization flags have a significant impact on compilation and simulation times. Depending on the selected optimization configuration, simulation can take anywhere from around 8 seconds to over 100. According to the table, verilation took the least time when only the -fgate
flag was enabled, as there was less optimization work to do. However, disabling all optimizations makes it slightly slower, likely due to operating on a more complex AST. When both -fgate
(gate elimination) and -finline
flags were enabled, compilation and simulation were the fastest, likely due to -finline
creating new opportunities for gate elimination. -fconst
(constant propagation) showed no significant effect on any of the times when combined with -fgate
.
Using Benchalot we were able to run these 16 benchmarks automatically, and we can easily modify the configuration file to include different Verilator versions or flags.
Toolchain optimization with Antmicro
Tools like Benchalot help Antmicro and its customers standardize and automate the creation of reproducible benchmarking setups, especially important when dealing with large and time-consuming designs, e.g. performance assessment in the context of scalability or improving modeling accuracy and precision in tools like Verilator. Based on reliable data, we can cater specific toolchains to the advanced use cases of our clients, be it hardware, ML or SW related.
If you are developing a user-facing toolchain for your silicon device or using a complex toolchain in your own development which you believe could be improved using a data-driven approach, don’t hesitate to reach out to us at contact@antmicro.com, and visit our interactive offer portal to learn more about our engineering services.