Trace-based evaluation of CPU cache usage in Renode

Published:

Topics: Open simulation, Open source tools

Although cache modeling is usually not part of ISS level simulation, there are cases where it’s crucial to understand memory access patterns e.g., when building a new chip and deciding on cache size and layout, or working on low-level, time-critical firmware that requires precise cache management. Since Antmicro’s open source Renode simulation framework is often used for architectural exploration thanks to its broad ISA support, and already includes advanced execution tracing options, we’ve expanded its capabilities with trace-based cache usage evaluation. By utilizing Renode’s execution tracing data, it is possible to gain detailed insights into cache behavior, such as cache hits, misses, and the overall hit ratio, which in turn enables precise analysis of how different cache configurations impact system performance, as well as identification of bottlenecks and opportunities for optimization.

In this article we introduce Antmicro’s solution for profiling CPU cache usage in Renode as a new addition to our portfolio of trace-based analysis features, developed partly within the scope of the European Union’s TRISTAN project, focusing on open and reusable IP and tooling for RISC-V software and hardware development. Since this mechanism is generic, it can be used in the context of any architecture like ARM or RISC-V during the architectural exploration and early prototyping phase.

Trace-based evaluation of CPU cache usage in Renode illustration

Cache usage evaluation in simulation

CPU cache requires careful management to correctly determine what data should be stored across its multiple levels, and is inherently a tradeoff between area and complexity on the one hand and performance on the other. Depending on hardware configuration and size of cache, and on the memory access patterns in the running software, frequent cache misses can heavily impact the overall performance of the system.

Cache operations are transparent from the perspective of software - there is no change in the code, regardless of cache usage, however the effect of cache can be very significant. Because of that, it’s difficult to reason about cache without actually running software. Running it on real hardware provides the best “model” of cache usage (because it’s the actual behavior), but that requires your hardware to be actually fully developed (which is not the case in pre-silicon scenarios) and involves additional cache counters that need to be supported by hardware and may require you to modify your software, which affects the cache behavior as well.

In simulation you can analyze cache regardless of the hardware platform, and without changing your software or using a debugger. This approach is especially useful for smaller CPUs that do not have hardware support for cache analysis. Additionally, employing Renode in the pre-silicon stage lets you perform this kind of analysis early and tweak your cache parameters to meet the performance requirements, as it will let you track execution of the software to find exactly which parts are slowed down by uncached memory access and how. This enables a hardware/software co-design paradigm that lets you track the performance of the system across the entire product lifecycle and fine-tune the behavior of both sides of the equation to squeeze the most from the area you have to fit in.

When developing the cache analysis solution for Renode, we leveraged the extensive execution tracing features already present in our simulator, including:

  • Execution tracer: this subsystem allows for monitoring and saving the traces of all major CPU operations, including program flow tracing, memory access logging and performed I/O operations.
  • Execution metrics: a module that allows to measure quantitative data related to the simulation, including number of accesses to peripherals, number of exceptions and the number of executed instructions (including counting of specific opcodes).
  • Execution profiler: a call stack analysis tool intended for debugging and inspecting the software running on a simulated CPU.
  • Python hooks: the framework utilizes a built-in Python API to provide an easy entry point to automate testing, extend the simulator’s functionalities, and integrate seamlessly with other tools and workflows.

Implementation details

Renode offers post-mortem analysis of memory accesses (generated by the execution tracing subsystem) to simulate CPU cache behavior and generate usage statistics. The generated trace.log file can then be passed to the cache modeling analyzer.

Cache configuration in Renode is derived from the following inputs:

  • cache and memory size
  • cache block size
  • cache associativity (k-way associative cache, direct mapping, fully associative)
  • replacement policy: the line eviction policy that will be used by cache, for example: FIFO, LRU, LFU or Random.

There are two ways of cache model configuration:

  • Command line interface arguments: this method allows users to specify detailed configuration options directly through the command line.
  • Presets: this method loads a predefined set of configuration parameters using a preset name.

The output of the cache modeling analyzer contains information about hits and misses. A high cache hit rate means that your cache is effective and efficient, while a low cache hit rate indicates that your cache is underused (which might impact performance).

For more information about cache analysis in Renode, refer to the project’s README and documentation.

Architectural exploration and early prototyping with Renode

The initial implementation supports Level 1 instruction and data caches. Future plans involve expanding it to support multi-level (L2, L3) as well as multi-core caches. Since the cache modeling analyzer was written in Python, it could be also easily extended to work with the pyrenode3 library.

The trace-based cache usage analysis introduced in this article, combined with broad ISA support including RISC-V and ARM, co-simulation options, and the building block nature of Renode, provides a comprehensive pre-silicon development and architectural exploration environment. If you’re interested in benefiting from these features or would like to find out how our services and Renode can improve your workflows, don’t hesitate to contact us at contact@antmicro.com.

See Also: