Efficient and compact cooling technologies play a pivotal role in determining the performance of high performance computing devices when used with highly parallel workloads in supercomputers. The present work deals with evaluation of different cooling technologies and elucidating their impact on the power, performance, and thermal management of Intel® Xeon Phi™ coprocessors. The scope of the study is to demonstrate enhanced cooling capabilities beyond today’s fan-driven air-cooling for use in high performance computing (HPC) technology, thereby improving the overall Performance per Watt in datacenters. The various cooling technologies evaluated for the present study include air-cooling, liquid-cooling and two-phase immersion-cooling. Air-cooling is evaluated by providing controlled airflow to a cluster of eight 300 W Xeon Phi coprocessors (7120P). For liquid-cooling, two different cold plate technologies are evaluated, viz, Formed tube cold pates and Microchannel based cold plates. Liquidcooling with water as working fluid, is evaluated on single Xeon Phi coprocessors, using inlet conditions in accordance with ASHRAE W2 and W3 class liquid cooled datacenter baselines. For immersion-cooling, a cluster of multiple Xeon Phi coprocessors is evaluated, with three different types of Integrated Heat Spreaders (IHS), viz., bare IHS, IHS with a Boiling Enhancement Coating (BEC) and IHS with BEC coated pin-fins. The entire cluster is immersed in a pool of Novec 649 (3M fluid, boiling point 49 °C at 1 atm), with polycarbonate spacers used to reduce the volume of fluid required, to achieve target fluid/power density of ∼ 3 L/kW. Flow visualization is performed to provide further insight into the boiling behavior during the immersion-cooling process.
Performance per Watt of the Xeon Phi coprocessors is characterized as a function of the cooling technologies using several HPC workloads benchmark run at constant frequency, such as the Intel proprietary Power Thermal Utility (PTU), and industry standard HPC benchmarks LINPACK, DGEMM, SGEMM and STREAM. The major parameters measured by sensors on the coprocessor include total power to the coprocessor, CPU temperature, and memory temperature, while the calculated outputs of interest also include the performance per watt and equivalent thermal resistance. As expected, it is observed that both liquid and immersion cooling show improved performance per Watt and lower CPU temperature compared to air-cooling. In addition to elucidating the performance/watt improvement, this work reports on the relationship of cooling technologies on total power consumed by the Xeon-Phi card as a function of coolant inlet temperatures. Further, the paper discusses form-factor advantages to liquid and immersion cooling and compares technologies on a common platform. Finally, the paper concludes by discussing datacenter optimization for cooling in the context of leakage power control for Xeon-Phi coprocessors.