Optimizing the resynthesis process in OpenROAD with simulated annealing

Published:

Topics: Open source tools

The OpenROAD ASIC design toolchain consists of a number of tools and modules, dedicated for various stages in a typical RTL to GDS flow, such as floorplanning, placement, or routing. It also includes utility modules that can be used for optimizing the design in terms of parameters like clock rate, area or power.

One of these modules, called cut, is able to cut out parts of the design, feed it to the ABC logic synthesis tool, and then reintegrate that exported logic back into the design in OpenROAD. This module isn’t available to users directly, but it’s used by other, user-facing modules such as cgt (clock gating, which we described in detail in a recent article) and rmp (resynthesis).

So far, rmp has been relying on a very simple and static resynthesis strategy, involving only a fixed sequence of ABC operations. In a recent customer project, Antmicro developed a new resynthesis strategy based on simulated annealing. Instead of applying a fixed set of ABC operations to the exported logic, this strategy tries to find a good sequence of ABC operations that will improve timing the most.

In this article, we will describe how resynthesis in OpenROAD works and show how simulated annealing can improve the design optimization process.

Simulated annealing in OpenROAD

Local resynthesis in OpenROAD

In order to optimize the design, the rmp module performs what’s called “local resynthesis”. Local resynthesis is an iterative process that involves extracting a smaller part of the design, optimizing that part locally, and then reintegrating it back into the design.

This process comes down to finding paths through the design that don’t satisfy the timing constraints of logic gates - in other words, places where a signal won’t make it in time through that path in one clock cycle. The difference between how much time a path takes and the clock period is called slack. The slack needs to be non-negative for the path to meet the timing requirements. If it’s positive, that either means we can speed up the clock, or the design is over-engineered, i.e. it could be simplified and still satisfy the timing constraints.

The rmp module takes paths with slack below a given threshold (0 by default) and some connected logic, and exports all that to ABC. Then, ABC is used to manipulate that logic to identify the most optimal paths. Finally, the improved logic is read back into OpenROAD, replacing the original logic.

Simulated annealing for resynthesis

Antmicro introduced an alternative approach to resynthesis in OpenROAD, focusing on first finding a good sequence of ABC operations that will improve timing the most. The process of searching for this sequence follows the simulated annealing technique. This technique comes down to a random search using the hill climb approach, but it occasionally picks a worse solution in order to escape local optima. The probability to pick a worse solution is controlled by a parameter called temperature, and the temperature decreases over the run time of the algorithm, so that the process ultimately converges.

The process starts with an initial ABC script - we generate a random solution (i.e. a random sequence of ABC operations), apply it to the design, calculate the worst slack, and then roll back the changes. Next, for n iterations (n is customizable, but it’s usually on the order of 100-1000), we pick a neighboring solution by applying a small random change to the current solution. This change involves adding an operation, removing an operation, or swapping two operations. We then apply the neighboring solution to the design and calculate the worst slack. If the slack is better (greater than before), we accept the solution and pick a new neighboring solution. If the slack is worse, we accept the solution with a probability calculated based on the current temperature (the higher the temperature, the higher the probability), and pick a new neighboring solution. After that, we roll back the changes and decrease the temperature based on the annealing schedule. In the current implementation, it decreases linearly to 0 till the final iteration. Finally, we apply the best known solution to the design.

Simulated annealing in OpenROAD flowchart

Testing the implementation

To show the effectiveness of the simulated annealing strategy, we’ll use one of the simulated annealing tests in OpenROAD.

When you run openroad, you’ll be in an interactive shell. Load the design:

read_liberty asap7.lib
read_lef asap7.lef
read_verilog aes_asap7.v 
link_design aes
read_sdc aes_asap7.sdc

Then, run

report_checks

to get a timing report:

Startpoint: ld (input port clocked by core_clock)
Endpoint: u0/subword[19]$_DFF_P_
          (rising edge-triggered flip-flop clocked by core_clock)
Path Group: core_clock
Path Type: max
Corner: slow

  Delay    Time   Description
---------------------------------------------------------
   0.00    0.00   clock core_clock (rise edge)
   0.00    0.00   clock network delay (ideal)
 200.00  200.00 ^ input external delay
   0.00  200.00 ^ ld (in)
  43.55  243.55 ^ _12054_/Y (BUFx2_ASAP7_75t_R)
  61.28  304.83 ^ _12066_/Y (BUFx2_ASAP7_75t_R)
  61.08  365.91 ^ _12067_/Y (BUFx2_ASAP7_75t_R)
  48.16  414.07 ^ _12197_/Y (OA21x2_ASAP7_75t_R)
  65.83  479.90 ^ _12198_/Y (BUFx2_ASAP7_75t_R)
  78.77  558.68 v _13022_/Y (INVx1_ASAP7_75t_R)
  85.51  644.19 ^ _23139_/CON (HAxp5_ASAP7_75t_R)
  39.37  683.56 v _23139_/SN (HAxp5_ASAP7_75t_R)
  56.95  740.51 v _13045_/Y (BUFx2_ASAP7_75t_R)
  53.72  794.23 v _13111_/Y (OA21x2_ASAP7_75t_R)
  46.41  840.64 v _13289_/Y (AO221x1_ASAP7_75t_R)
  52.20  892.84 v _13290_/Y (OA211x2_ASAP7_75t_R)
  42.82  935.66 v _13295_/Y (OR3x1_ASAP7_75t_R)
  52.15  987.81 v _13296_/Y (OA211x2_ASAP7_75t_R)
  33.57 1021.38 v _13297_/Y (AO21x1_ASAP7_75t_R)
   0.00 1021.38 v u0/subword[19]$_DFF_P_/D (DFFHQNx1_ASAP7_75t_R)
        1021.38   data arrival time

1000.00 1000.00   clock core_clock (rise edge)
   0.00 1000.00   clock network delay (ideal)
   0.00 1000.00   clock reconvergence pessimism
        1000.00 ^ u0/subword[19]$_DFF_P_/CLK (DFFHQNx1_ASAP7_75t_R)
  -9.54  990.46   library setup time
         990.46   data required time
---------------------------------------------------------
         990.46   data required time
        -1021.38   data arrival time
---------------------------------------------------------
         -30.92   slack (VIOLATED)

This shows the worst path in the design and, for our example, that it doesn’t meet the timing by 31ps.

Next, run resynthesis with simulated annealing:

resynth_annealing

After that, run report_checks again to get:

Startpoint: ld (input port clocked by core_clock)
Endpoint: u0/subword[17]$_DFF_P_
          (rising edge-triggered flip-flop clocked by core_clock)
Path Group: core_clock
Path Type: max

  Delay    Time   Description
---------------------------------------------------------
   0.00    0.00   clock core_clock (rise edge)
   0.00    0.00   clock network delay (ideal)
 200.00  200.00 ^ input external delay
   0.00  200.00 ^ ld (in)
   8.41  208.41 v cut_278390/Y (CKINVDCx20_ASAP7_75t_R)
  21.41  229.82 ^ cut_278391/Y (NAND3xp33_ASAP7_75t_R)
  21.82  251.65 v cut_278392/Y (NAND2x1_ASAP7_75t_R)
  59.80  311.44 ^ cut_278393/Y (NOR2x1p5_ASAP7_75t_R)
 111.59  423.03 v cut_278394/Y (INVx1_ASAP7_75t_R)
 101.85  524.88 ^ _23139_/CON (HAxp5_ASAP7_75t_R)
  71.08  595.96 v _23139_/SN (HAxp5_ASAP7_75t_R)
  86.32  682.28 ^ cut_278602/Y (INVx1_ASAP7_75t_R)
  29.45  711.73 v cut_278945/Y (NAND2xp33_ASAP7_75t_R)
  32.14  743.87 ^ cut_278946/Y (NOR2xp33_ASAP7_75t_R)
  25.75  769.62 v cut_278947/Y (NOR2xp33_ASAP7_75t_R)
  24.44  794.05 ^ cut_278948/Y (NAND2xp33_ASAP7_75t_R)
  18.64  812.69 v cut_278949/Y (NAND2xp33_ASAP7_75t_R)
  22.51  835.20 ^ cut_278951/Y (NAND2xp33_ASAP7_75t_R)
  20.75  855.95 v cut_278956/Y (NOR2xp33_ASAP7_75t_R)
  19.11  875.07 ^ cut_278966/Y (NOR2xp33_ASAP7_75t_R)
  19.02  894.08 v cut_278978/Y (NAND2xp33_ASAP7_75t_R)
  15.95  910.03 ^ cut_278979/Y (NAND2xp5_ASAP7_75t_R)
  16.36  926.39 v cut_278980/Y (NOR2xp33_ASAP7_75t_R)
  23.17  949.57 ^ cut_278981/Y (NOR2xp33_ASAP7_75t_R)
   0.00  949.57 ^ u0/subword[17]$_DFF_P_/D (DFFHQNx1_ASAP7_75t_R)
         949.57   data arrival time

1000.00 1000.00   clock core_clock (rise edge)
   0.00 1000.00   clock network delay (ideal)
   0.00 1000.00   clock reconvergence pessimism
        1000.00 ^ u0/subword[17]$_DFF_P_/CLK (DFFHQNx1_ASAP7_75t_R)
 -29.84  970.16   library setup time
         970.16   data required time
---------------------------------------------------------
         970.16   data required time
        -949.57   data arrival time
---------------------------------------------------------
          20.59   slack (MET)

This time, the timing is met with 20.6ps of slack.

Integrating with the OpenROAD flow

This entire process can be performed at any stage in the OpenROAD flow, but the most convenient is right after synthesis. Assuming you have OpenROAD-flow-scripts already set up, the first step is to run the synth target. This target produces a synthesized, techmapped netlist. This netlist can be loaded into OpenROAD and resynthesized.

Run an OpenROAD script that:

  • Loads your synthesized design,
  • Runs resynth_annealing with your desired parameters,
  • Overwrites the synthesis result.

The specifics of this depend on your chosen PDK and other configuration.

After that, you can simply run the rest of the flow (by not passing a target), and it’ll pick up your resynthesized netlist.

Optimizing OpenROAD for more efficient chip design

With increased adoption of the open source OpenROAD ASIC design toolchain, Antmicro has been helping customers integrate OpenROAD with their workflows and adapt it to their specific use cases. We’ve also been improving the toolchain itself - some recent contributions include automatic clock gating and Bazel-orfs support.

To find out more about how Antmicro can help you improve your digital design workflows, or learn about our broad portfolio of open source tools for design aggregation, enhanced developer productivity, verification, and more, contact us at contact@antmicro.com.

See Also: