Abstract
High performance computing (HPC), artificial intelligence (AI) and cognitive systems have initiated a new era of computing. Efficient thermal management technologies of these systems have been vital due to the increasing power density in the electronic components. In 2018 IBM delivered the fastest supercomputer of the world through Summit with 200 petaflops computing performance with LINPACK benchmarks. The system is both air and water cooled, where water is employed to cool the high power dissipated electronic components which are the IBM POWER9 processors and NVIDIA GPUs. In this paper, we highlight the overview of the thermal and mechanical design strategies applied on these systems. In air cooled systems, we discuss the fan and heat sink designs, as well as the preheating effect on PCI section. Liquid cooled system has a unique coldplate design which cool the processors and the GPUs with water. We examine the water flow path design for the processor and the GPUs by providing the thermal performance of the coldplate. Also, an overview of the cooling assemblies such as TIMs and air baffles in the servers are discussed. Moreover, unit and rack manifolds are investigated; flow and pressure distribution at the node and rack level are provided.