Data centers are the computational hub of the next generation. Rise in demand for computing has driven the emergence of high density datacenters. With the advent of high density, mission-critical datacenters, demand for electrical power for compute and cooling has grown. Deployment of a large number of high powered computer systems in very dense configurations in racks within data centers will result in very high power densities at room level. Hosting business and mission-critical applications also demand a high degree of reliability and flexibility. Managing such high power levels in the data center with cost effective reliable cooling solutions is essential to feasibility of pervasive compute infrastructure. Energy consumption of data centers can also be severely increased by over-designed air handling systems and rack layouts that allow the hot and cold air streams to mix. Absence of rack level temperature monitoring has contributed to lack of knowledge of air flow patterns and thermal management issues in conventional data centers. In this paper, we present results from exploratory data analysis (EDA) of rack-level temperature data collected over a period of several months from a conventional production datacenter. Typical datacenters experience surges in power consumption due to rise and fall in compute demand. These surges can be long term, short term or periodic, leading to associated thermal management challenges. Some variations may also be machine-dependent and vary across the datacenter. Yet other thermal perturbations may be localized and momentary. Random variations due to sensor response and calibration, if not identified, may lead to erroneous conclusions and expensive faults. Among other indicators, EDA techniques also reveal relationships among sensors and deployed hardware in space and time. Identification of such patterns can provide significant insight into data center dynamics for future forecasting purposes. Knowledge of such metrics enables energy-efficient thermal management by helping to create strategies for normal operation and disaster recovery for use with techniques like dynamic smart cooling.

