# Temperature dependent timing in standard cell designs

András Timár, Márta Rencz Budapest University of Technology and Economics Department of Electron Devices Budapest, Hungary 1111 Email: timarlrencz@eet.bme.hu

Abstract—This paper proposes a methodology to simulate temperature dependent timing in standard cell designs. Temperature dependent timing characteristics are derived from standard delay format (SDF) files that are created by synthesis tools automatically. A case study is also presented in this paper where the temperature dependent frequency variation of a ring oscillator is simulated demonstrating the necessity of temperature dependent timing simulations. An adaptively refineable partitioning method for simulating standard cell designs logi-thermally is proposed as well. This paper also introduces recent enhancements in the CellTherm logi-thermal simulator developed in the Department of Electron Devices, BME, Hungary.

Index Terms-temperature dependent delays, grid, logithermal, electro-thermal, simulation

### I. INTRODUCTION

In this paper the most recent improvements of the CellTherm [1], [2] simulator engine is introduced. The CellTherm logi-thermal simulator is capable of simulating standard cell integrated circuit designs given with Verilog structural description. The simulator couples a standard compliant logic simulator (e.g. QuestaSim<sup>®</sup>, Incisive<sup>®</sup>) and thermal solver engine and calculates device temperatures in function of simulation time. CellTherm also reads the full layout, power and timing data of the design and calculates temperature-dependent delays of the consisting standard cells. This way CellTherm not only can detect hot-spots in the design but can annotate device timing and propagation delays during the simulation. CellTherm also can create heating animations of the design and watch out for thermally induced timing violations with the help of the logic simulator.

### II. RELATED WORK

In [3] Pable et al. deals with ultra-low-power signaling challenges caused by process, voltage and temperature (PVT) variations. Exponential dependency of subthreshold drive current on  $V_{th}$  and temperature in subthreshold operating region makes process and temperature variations of great interest while designing robust ULP systems. Small variation in the device  $V_{th}$  will translate into exponential variation in bias current and hence the device delay and power dissipation.

Rebaud et al. in [4] describe a new monitoring system, allowing failure anticipation in real-time, looking at the timing slack of a pre-defined set of observable flip–flops. They propose adaptive voltage scaling (AVS) and adaptive body biasing (ABB) to compensate PVT variations.

Lin et al. in [5] introduce a novel 9 transistor SRAM cell where PVT variations are taken into account. The proposed CMOS SRAM cell is PVT tolerant.

Kumar and Kursun in [6] attract attention to the fact that temperature-dependent propagation delay characteristics of CMOS integrated circuits will experience a complete reversal in the near future. They demonstrate that the speed of circuits in a 45-nm CMOS technology is enhanced when the temperature is increased at the nominal supply voltage. This is a quite interesting phenomenon contrary to the older technology generations.

In [7] Sánchez-Azqueta et al. introduce a CMOS ring VCO design where PVT variations were taken into account. To overcome the limitation of the PVT variations, a tuning range of about 20% is sufficient for their ring VCO.

In [8] the leakage current, active power and delay characterizations of the dynamic dual  $V_t$  CMOS circuits in the presence of process, voltage, and temperature (PVT) fluctuations are analyzed based on multiple parameter Monte Carlo method.

In [9] Winther et al. show that using wirelength as the evaluation metric for floorplanning does not always produce a floorplan with the shortest delay. They propose a temperature dependent wire delay estimation method for thermal aware floorplanning algorithms, which takes into account the thermal effect on wire delay.

[10] presents the temperature influence on energy consumption and propagation time delay in CMOS ASIC circuits with several measurements.

### III. DESIGN FOR DEMONSTRATION

Our case study was a  $10 \text{mm} \times 10 \text{mm}$  standard cell digital circuit with a 4-bit D-flip-flop chain and a ring oscillator circuit. The design is partitioned into  $10 \times 10$  tiles where the temperatures are calculated. This design is a fictional design and cell sizes are intentionally enlarged to be able to demonstrate the effect of temperature variations and evolving hot-spots on cell propagation delay. Power dissipations for logic transitions in the cells are also fictional values large enough to spectacularly demonstrate the mentioned effects.

Standard Delay Format (SDF) files were bred to define inverter cell delays with which the temperature dependent

frequency of the ring oscillator can be demonstrated.

In Fig.1 the schematic layout of the design is shown. In the upper part of the chip the four D-flip-flops form the exciting circuit. The dissipated powers in the DFFR cells are intentionally chosen to be 1000-times larger (1mW) than the inverters' dissipated power per logic transition (1 $\mu$ W). In the lower part of the layout is the ring oscillator formed by 10 inverters and a kick-in NAND gate.



Fig. 1. Design layout with  $10 \times 10$  partitioning mesh

### IV. GRID PARTITIONING

The surface of the standard cell IC is divided into subregions called *partitions* where dissipated powers are accumulated and temperatures are calculated. The partitioning approach speeds up simulation times and initial database parsing in large designs containing more than 1000 standard cells. This approach makes the initial thermal model generation practically insensitive to the number of standard cells. Of course logic simulation can take longer for designs containing high number of cells, the time taken by the thermal engine to solve equations remains the same because thermal equations are generated for the partitioning grid not the standard cells.

Powers dissipated in partitions and temperatures of cell instances are calculated using the partition area and overlapping cell area ratio. This means that if a standard cell falls partly into a partition, then the cell's power dissipation is taken into account proportionally to the overlap ratio.

Fig.2 depicts the power distribution of the design. In our test case, the DFFR flops are driven in a counter-like pattern, that is, switching activity of the flip-flop chain can be specified with (1),

$$A(dffr4) = 2 \cdot A(dffr3) = 4 \cdot A(dffr2) = 8 \cdot A(dffr1) \quad (1)$$

where A() means the switching activity of the cell. The dissipated dynamic power per logic transition is proportional to the switching activity.



Fig. 2. Power map of test design

In the power map it can be clearly seen that the power dissipation of *dffr4* cell is the largest. The ring oscillator's power dissipation can also be seen in Fig.2 but it is much less than the flip-flop chain's dissipation even though its oscillating frequency is larger than the switching frequency of the DFFR chain. The resulting temperature map is shown in Fig.3.



Fig. 3. Temperature map of test design

During the partitioning process partitions can be subdivided into subpartitions down to an unlimited depth. This way temperature and delay resolution in critical areas of the circuit can be refined. In a basic scenario the partitions can be sliced into 4 equal parts that can further be sliced into anouther 4 parts as shown in Fig.4

## V. DATA SERIALIZATION

The initial database for the logi-thermal simulation is generated from the *LEF/DEF* layout files, Liberty *.lib* files, and *SDF* files. In a design with thousands of standard cells this initial database creation can take hours to complete. For example, in a design containing 1490 cells manufactured on a 65nm STMicroelectronics process the initial database generation and



Fig. 4. Subpartitioning area of interest

thermal model generation took 56 minutes. This initial process of internal database creation and thermal model generation has to be done only once in the beginning of the simulation, however, when running simulations with different stimuli the database creation has to be done every time a new simulation is started.

In this paper a serialization method is described that speeds up initial loading times of simulations. After the initial database creation in the beginning of the simulation, CellTherm can be asked to serialize the created internal database to disk.

In computer science, in the context of data storage and transmission, serialization is the process of converting a data structure or object state into a format that can be stored (for example, in a file or memory buffer, or transmitted across a network connection link) and "resurrected" later in the same or another computer environment [11].

The serialized data can be read back from disk in fractions of seconds when starting a new simulation run thus the time consuming process of thermal model generation and database creation can be skipped. The deserialized data in memory will result in the same data as if the database creation phase had been run.

A comparison is presented here between the serialized and the pure initial loading time of the simulator. When loading without serialization the load time was 99.654 seconds. The serialized load time resulted to 0.262 second measured by the TCL command *time*. The thermal model generation time depends only on the structure and size of the chip die and packaging which does not change from simulation to simulation.

### VI. TEMPERATURE-DELAY FUNCTIONS FROM SDF

Temperature-delay functions for the standard cells are calculated from synthesizer-generated SDF files. SDF files contain timing and delay data for cells in the placed and routed design. Synthesis tools usually generate these SDF files within voltage, process and temperature corners. Timing data are present in the SDF file for the worst case, nominal and best case corners. From these corner cases the corresponding temperatures and thus the related delays can be extracted. By interpolating the extracted temperature-delay corners delays can be calculated for arbitrary temperature values.



Fig. 5. Delay versus temperature functions for standard cells

### VII. TEMPERATURE DEPENDENT DELAY SIMULATION

In the test vehicle showed in Fig.1 the temperaturedependent frequency variation of the ring oscillator is demonstrated. By the dissipation of the D-flip-flop chain the resulting temperature map on the IC surface will not be uniform thus each inverter cell in the ring oscillator inverter-chain will have a different propagation delay. The delays are calculated from the standard cell's current temperature and updated in every simulation timestep. The varying delays of the cells will mistune the frequency of the ring oscillator. The changing frequency over simulation time is monitored constantly by the logi-thermal simulator and displayed to the user in every simulation timestep. The output frequencies of the ring oscillator after different simulation times can be seen in Table I.

### VIII. RESULTS

Temperature curves versus simulation time for the DFFR flip-flop chain and the *inv5* cell is shown in Fig.6. It can be observed that the circuit reaches steady-state temperature near the 4th second. The temperatures throughout this paper are differential not absolute temperature values.

In Fig.7 the period and frequency of the ring oscillator is depicted versus simulation time. It is clearly visible that with the increasing simulation time the temperature also rises thus the period of the ring oscillator starts to rise also. As the temperature of the functioning circuit rises the period of the ring oscillator also rises according to the temperature-delay functions in Fig.5. This increase in the period means a decrease

TABLE I OUTPUT FREQUENCY OF RING OSCILLATOR DEPENDING ON DEVICE TEMPERATURE

| Simulation<br>time [ms] | Period [ns] | Frequency<br>[MHz] |
|-------------------------|-------------|--------------------|
| 10                      | 122.0000    | 8.196721           |
| 250                     | 131.6340    | 7.596821           |
| 500                     | 136.3500    | 7.334067           |
| 750                     | 139.0860    | 7.189796           |
| 1000                    | 140.7680    | 7.103887           |
| 2000                    | 143.2520    | 6.980705           |
| 3000                    | 143.7240    | 6.957780           |
| 3250                    | 143.7700    | 6.955554           |
| 3500                    | 143.7980    | 6.954200           |
| 3750                    | 143.8240    | 6.952942           |
| 3800                    | 143.8240    | 6.952942           |
| 3900                    | 143.8360    | 6.952362           |
| 4000                    | 143.8420    | 6.952072           |



Fig. 6. Temperature of DFFR and inv5 cells

in the oscillation frequency as shown by Fig.7. The frequency of the oscillator drops from the initial 8.1967 MHz to 6.952 MHz in the 4th second.

As the thermal steady state is reached near the 4th second, the frequency also reaches a steady state at the 4th second. This phenomenon is clearly visible in Fig.7.

In Fig.8 the period and frequency function is depicted versus the temperature of the *inv5* cell.

# IX. SUMMARY

In this paper a methodology for simulating temperature dependent propagation delays in a ring oscillator circuit is demonstrated. A special demonstration circuit has been created to spectacularly demonstrate the effect of circuit self-heating on propagation delays and operating frequency.

Temperature dependent delays are calculated from synthesizer-generated SDF files making thermal-aware logic simulations possible.



Fig. 7. Period and frequency of ring oscillator over time



Fig. 8. Period and frequency of ring oscillator versus temperature of *inv5* cell

A grid partitioning method has been introduced where the created thermal model is independent of the number of standard cells in the circuit. This approach speeds up initial thermal model generation and does not scale with increasing standard cell count.

This paper also proposed a method of serialization that can speed up initial database loading for the logi-thermal simulation with CellTherm.

Finally, simulation results are shown to prove the concept introduced in this paper.

### X. ACKNOWLEDGEMENT

This work was partly supported by the IP 248603 THER-MINATOR FW7 project of the European Union and by the Hungarian Government through TÁMOP-4.2.1/B-09/1/KMR-2010-0002.

### REFERENCES

- A. Timar, G. Bognar, A. Poppe, and M. Rencz, "Electro-thermal cosimulation of ICs with runtime back-annotation capability," in *Thermal Investigations of ICs and Systems (THERMINIC), 2010 16th International Workshop on*, Barcelona, Spain, oct. 2010, pp. 1 –6.
- [2] A. Timar, G. Bognar, and M. Rencz, "Improved power modeling in logi-thermal simulation," in *Thermal Investigations of ICs and Systems* (*THERMINIC*), 2011 17th International Workshop on, Paris, France, sept. 2011, pp. 1 –6.
- [3] S. Pable and M. Hasan, "Ultra-low-power signaling challenges for subtreshold global interconnects," *Integration, the VLSI Journal*, vol. 45, no. 2, pp. 186 – 196, 2012. [Online]. Available: http://www.sciencedirect.com/science/article/pii/S0167926011000770
- [4] B. Rebaud, M. Belleville, E. Beigné, C. Bernard, M. Robert, P. Maurine, and N. Azemard, "Timing slack monitoring under process and environmental variations: Application to a DSP performance optimization," *Microelectronics Journal*, vol. 42, no. 5, pp. 718 – 732, 2011. [Online]. Available: http://www.sciencedirect.com/science/article/ pii/S0026269211000292
- [5] S. Lin, Y.-B. Kim, and F. Lombardi, "Design and analysis of a 32nm PVT tolerant CMOS SRAM cell for low leakage and high stability," *Integration, the VLSI Journal*, vol. 43, no. 2, pp. 176 – 187, 2010. [Online]. Available: http://www.sciencedirect.com/science/article/ pii/S0167926010000040
- [6] R. Kumar and V. Kursun, "Reversed temperature-dependent propagation delay characteristics in nanometer CMOS circuits," *Circuits and Systems II: Express Briefs, IEEE Transactions on*, vol. 53, no. 10, pp. 1078 – 1082, oct. 2006.
- [7] C. Sánchez-Azqueta, S. Celma, and F. Aznar, "A 0.18 μm CMOS ring VCO for clock and data recovery applications," *Microelectronics Reliability*, vol. 51, no. 12, pp. 2351 – 2356, 2011. [Online]. Available: http://www.sciencedirect.com/science/article/pii/S0026271411001697
- [8] J. Wang, N. Gong, L. Hou, X. Peng, R. Sridhar, and W. Wu, "Leakage current, active power, and delay analysis of dynamic dual vt cmos circuits under P–V–T fluctuations," *Microelectronics Reliability*, vol. 51, no. 9–11, pp. 1498 – 1502, 2011, proceedings of the 22th European Symposium on the RELIABILITY OF ELECTRON DEVICES, FAILURE PHYSICS AND ANALYSIS. [Online]. Available: http://www.sciencedirect.com/science/article/pii/S0026271411002010
- [9] A. Winther, W. Liu, A. Nannarelli, and S. Vrudhula, "Temperature dependent wire delay estimation in floorplanning," in *NORCHIP*, 2011, nov. 2011, pp. 1 –4.
- [10] A. Golda and A. Kos, "Temperature influence on power consumption and time delay," in *Digital System Design*, 2003. Proceedings. Euromicro Symposium on, sept. 2003, pp. 378 –382.
- [11] (2012, 16th April, 10:38) Serialization. [Online]. Available: http: //en.wikipedia.org/wiki/Serialization