RISC-V co-design using trace-based simulation with Renode and TBM

Published:

Topics: Open source tools, Open OS

The design of modern hardware components such as processors and accelerators is a multidisciplinary effort at the intersection of hardware and software development. Hardware-software co-design is a challenging task that needs actionable data to identify bugs and bottlenecks. Renode, Antmicro’s open source simulation framework, enables pre-silicon HW/SW co-design for complete SoCs such as OpenTitan, with a fully controllable environment including cores and I/O blocks, capable of running binary-compatible software and providing complete insight into its execution.

Recently, in a joint effort with the Google AmbiML team, we have added Renode support for trace-based simulation through a dedicated tool called Trace Based Model (TBM). This allows you to quickly measure the efficiency of microarchitectural choices in tandem with your code to swiftly identify and address design issues. The goal behind the Renode-TBM integration was to generate TBM-compatible data from a Renode simulation to gather meaningful metrics and enable automated testing and benchmarking in a continuous integration pipeline, thus shortening the feedback loop for HW/SW co-design.

CI-driven RISC-V hardware-software co-design using Renode and TBM

Benchmarking hardware using execution traces

Trace-based simulation is a methodology that uses execution traces to predict the performance of the system, which in the case of TBM is based on data provided by an external simulator such as Renode. After simulating your hardware design and executing the software, Renode produces a rich set of logs that can be fed into TBM to provide information about performance and potential bottlenecks.

Although Renode is a functional and not a cycle-accurate simulator, its deterministic and fully observable nature provides access to useful metrics about hardware usage that can be used to optimize execution. Renode’s execution tracing capabilities allow you to simulate a complete system of interconnected components such as CPUs, memory, and a wide range of buses, generating extensive execution traces of unmodified binaries without changing the behavior of the simulation.

To enable the generation of TBM-compatible tracing data, we added memory access tracing to Renode, which is essential for determining the type of memory access hits. In addition, TBM has driven efforts to extend the existing support for RISC-V vector instructions in Renode to capture additional information and enable benchmarking of Renode’s RVV implementation.

Generating TBM-compatible traces in Renode

The work described here spanned a long period where Renode was being used in the real-world development of an upcoming SoC. The initial implementation of TBM, which was used in the earlier stages of the project, required you to use the gentrace-renode Python script to convert Renode traces to a TBM-compatible format, but now we have integrated TBM trace generation directly into Renode to allow a more automated workflow.

To generate TBM-compatible traces, run the following commands in the Renode Monitor:

(machine-0) cpu CreateExecutionTracing "tracer" @renode.trace TraceBasedModel true
(machine-0) tracer TrackMemoryAccesses
(machine-0) tracer TrackVectorConfiguration

The traces are saved to a file specified in the command when the simulation is finished. Assuming you have all the project requirements installed, now you can feed the execution tracing data to TBM to generate the report using:

python3 tbm/tbm.py -u config/rvv-simple.yaml --print-trace detailed --report-dont-include-cfg --report out_renode_tbm_report.txt --verbose ~/renode-portable/renode.trace

The report generated by TBM will contain the following information:

*** cycles: 1171
*** retired instructions per cycle: 0.85 (1000)
*** retired / fetched instructions: 0.25
*** branch count: 330
*** scalar load/store stall rate: 1.50 stalls per-instruction

*** stall cycles:
 SC: 14% (172)
 FE: 14% (172)

*** instructions per cycle:
 lsu0.eiq: 0.28 (327)
 lsu0.pipe: 0.28 (327)
 lsu0.wbq: 0.00 (0)
 alu0.eiq: 0.29 (338)
 alu0.pipe: 0.29 (341)
 alu0.wbq: 0.29 (341)
 branch0.eiq: 0.28 (328)
 branch0.pipe: 0.28 (330)
 branch0.wbq: 0.00 (1)
 csr0.eiq: 0.00 (0)
 csr0.pipe: 0.00 (2)
 csr0.wbq: 0.00 (2)
 S: 0.85 (1000)
 V: 0.00 (0)
 FE: 3.37 (3952)

*** utilization:
 lsu0.eiq: 38% (327)
 lsu0.pipe: 56% (327)
 lsu0.wbq: 0% (0)
 alu0.eiq: 37% (338)
 alu0.pipe: 29% (341)
 alu0.wbq: 15% (341)
 branch0.eiq: 98% (328)
 branch0.pipe: 28% (330)
 branch0.wbq: 0% (1)
 csr0.eiq: 0% (0)
 csr0.pipe: 0% (2)
 csr0.wbq: 0% (2)
 S: 53% (1000)
 V: 0% (0)
 FE: 60% (3952)

Automated HW/SW co-design benchmarking

This workflow can be further automated by a CI pipeline which simulates your hardware using Renode script and instantly generates a report from the execution:

apt update
apt install -y python3 python3-pip git wget cmake

cd ~
git clone https://github.com/google/flatbuffers.git
cd flatbuffers/
cmake -G "Unix Makefiles"
make -j

cd ~
wget https://dl.antmicro.com/projects/renode/builds/renode-latest.linux-portable.tar.gz
mkdir -p renode-portable
tar -zxf renode-latest.linux-portable.tar.gz -C ~/renode-portable --strip-components=1
~/renode-portable/renode \
--console \
--disable-xwt \
-e 'i @scripts/single-node/hifive_unmatched.resc' \
-e 's7 CreateExecutionTracing "tracer" @renode.trace TraceBasedModel' \
-e 'tracer TrackMemoryAccesses' \
-e 'tracer TrackVectorConfiguration' \
-e 'emulation RunFor "0.00001"' \
-e 'q'

git clone https://github.com/AmbiML/trace-based-model.git
cd ~/trace-based-model/
pip install -r requirements.txt
~/flatbuffers/flatc -o tbm --python config/instruction.fbs
python3 tbm/tbm.py -u config/rvv-simple.yaml --print-trace detailed --report-dont-include-cfg --report out_renode_tbm_report.txt --verbose ~/renode-portable/renode.trace
cat out_renode_tbm_report.txt

Renode’s advanced tracing capabilities

Renode’s tracing capabilities are not limited to TBM-compatible data. Some of the features allow you to get additional data during the simulation itself, e.g., logging executed function names with cpu LogFunctionNames true or accessing peripherals with sysbus LogPeripheralAccess <peripheral-name> true. In pre-silicon development, you can generate Renode traces compatible with the RISC-V DV framework for SoC design and verification. You can also use Renode’s built-in Metrics Analyzer to get metrics like executed instructions, memory accesses or exceptions in the form of an easy-to-read graph.

Renode Metrics Analyzer

The main execution tracing functionality in Renode, executed using EnableExecutionTracing, supports several modes that let you track information like program counter values or executed opcodes, which can be converted to human-readable instruction names with all used arguments using Renode’s built-in LLVM-based disassembler.

Automate HW/SW benchmarking with Renode

Renode is a flexible tool that can be easily integrated into an existing workflow to develop both hardware and software in a repeatable, simulated environment. If you are interested in simulating your next hardware design and testing its performance in an automated CI pipeline, contact Antmicro at contact@antmicro.com.

See Also: