Abstract
Lab-grown diamond heat spreaders are becoming attractive solutions compared to traditional copper heat spreaders due to their high thermal conductivity, the ability to directly bond them on silicon, and allow for an ultra-thin silicon layer. Researchers have developed various thermal models and prototypes of lab-grown diamond heat spreaders to evaluate their cooling performance and heat spreading ability. The majority of existing thermal models are built using finite-element method (FEM) based simulators such as COMSOL and ANSYS. However, such commercial simulators are computationally expensive and lead to long solution times along with large memory requirements. These limitations make commercial simulators unsuitable for evaluating numerous design alternatives or runtime scenarios for real-world high-performance processors. Because of this modeling challenge, none of the existing works have evaluated the thermal behavior of lab-grown diamond heat spreaders on real-world high-performance processors running realistic application benchmarks. Recently, we have developed a parallel compact thermal simulator, PACT, that is able to carry out fast and accurate steady-state and transient thermal simulations and can be extended to support emerging integration and cooling technologies. In this paper, we use PACT to evaluate the steady-state and transient cooling performance of lab-grown diamond heat spreaders against traditional copper heat spreaders on various real-world high-performance processors (e.g., Intel i7 6950X, IBM Power9, and PicoSoC). By using PACT with architectural performance and power simulators such as Sniper and McPAT, we are able to run transient simulations with realistic benchmarks. Simulation results show that lab-grown diamond heat spreaders achieve maximum temperature and thermal gradient reductions of up to 26.73 °C and 13.75 °C when compared to traditional copper heat spreaders, respectively. The maximum steady-state and transient simulation times of PACT for the real-world high-performance chips and realistic applications used in our experiments are 259 s and 22 min, respectively.