The rapid growth in the number of data centers combined with the high-density heat dissipation of computer and telecommunications equipment has made energy efficient thermal management of data centers a key research area. Localized hybrid air–water cooling is one approach to more effectively control the cooling when there is wide variation in the amount of dissipation in neighboring racks while the traditional air cooling approach requires overprovisioning. In a closed, hybrid air–water cooled server cabinet, the generated heat is removed by a self-contained system that does not interact with the room level air cooling system. Here, a hybrid-cooled enclosed cabinet and all its internal components were characterized experimentally in steady-state mode (e.g., experimentally determined heat-exchanger effectiveness and IT characterization). Also, a comprehensive numerical model of the cabinet was developed and validated using the experimental data. The computational model employs full numerical modeling of the cabinet geometry and compact models to represent the servers and the air/water heat exchanger. The compact models were developed based on experimental flow and thermal characterization of the internal components. The cabinet level model has been used to simulate a number of operating scenarios relevant to data center applications such as the effect of air leakage within the cabinet. The effect of the air side and the water side failure of the cooling system on the IT performance were investigated experimentally. A comparison was made of the amount of time required to exceed the operating temperature limit for the two scenarios.
Due to the high cooling capacity, lower power consumption, and increased packaging density, hybrid and liquid cooling systems are becoming a more commonly used thermal management solution for data centers . An advantage of hybrid cooling systems is the proximity of the heat exchanger to the IT equipment, which enables improvements in the thermal efficiency . There is not an extensive literature on water-cooled cabinets [3–9]. Hybrid cooling is often introduced via add-on components such as the side car which is an air–liquid heat exchanger that is mounted on the side of a server cabinet. This heat exchanger was introduced by IBM and is designed to remove up to 35 kW .
The study presented here involves a comprehensive experimental and computational analysis of a fully contained server cabinet. The experiments were performed on an Emerson Knurr CoolTherm hybrid-cooled server cabinet . The closed cabinet serves as a scaled-down version of a broader range of contained systems. The cabinet comprises the major cooling hardware found in room-level systems (e.g., heat exchanger, blower, chilled-water control, etc.) and is subject to the same type operational and failure scenarios.
The approach used to develop a validated computational model for the cabinet can be adapted to a broader range of thermal management systems. Computational modeling using cfd software is widely used for data center thermal management system design and analysis . The accuracy and validity of the computational results depend on the complexity and input data of the numerical model . Model complexity, computation time, and accuracy of the results are the trade-offs that must be balanced. Depending on the specific details that need to be analyzed, the complexity of the model can be adjusted. The results obtained from a fully detailed model may show good agreement with experimental data, but generally require substantial computing time. Here, a combination of detailed and compact modeling is used to analyze the performance of the fully enclosed cabinet. In Ref. , an effective and computationally efficient proper orthogonal decomposition (POD)-based reduced order modeling approach was presented and applied to an operational data center, in order to predict the data center temperature fields as a function of the air flow rate of one (computer room air conditioning (CRAC)) unit. However, in many facilities, there are several CRAC units with different combinations of airflow rates, making the thermal modeling more challenging.
The internal geometry of the cabinet including rear door channels, front channel, heat exchanger housing and mounting rails was modeled in detail. For the IT equipment, a simplified server model (SSM) is used, based on an innovative black box approach for incorporating the transport through the servers, that does not require meshing the regions of the cabinet containing servers . Alissa et al. have developed a new methodology for IT equipment airflow analysis in contained environments, including correlations between CPU reliability and air flow conditions . In addition, a unique compact model for the V-shaped heat exchanger in the hybrid cooled, closed cabinet was developed. An impedance curve for flow across the V-shaped heat exchanger was calculated based on tube orientation, tube diameter, fin thickness, fin spacing, and other geometric aspects. For the computational modeling, the V-shaped heat exchanger is treated as two separate heat exchangers to accurately capture the pressure drop and thermal performance. The flow resistance is calculated based on the superposition of two components calculated independently: the drag coefficient of a cylinder, to represent the resistance of the tubes, and the resistance to flow between parallel plates, to represent the effect of the fins .
The computational model of the server cabinet is used here to investigate a number of steady-state and transient operating conditions. Also, the effect of air leakage within the cabinet and externally is studied. The impact of cold aisle containment versus open aisle was investigated in Ref. . They experimentally proofed the effect of containment on overall data center energy efficiency and annual energy savings.
Figure 1 illustrates the air circulation within the Emerson Knurr CoolTherm closed server cabinet. To some degree, the cabinet emulates the conditions of containment in a data center. The front side of the cabinet is equivalent to the cold aisle while the space between the server outlets and the rear door is the hot aisle. The overall air circulation is driven by three radial fans located in the rear door of the cabinet. The hot air is drawn in by the rear door fans from the outlets of the IT equipment and then is directed downward through the heat exchanger, located at the bottom of the cabinet, where the heat is removed. The cooled air circulates upward in the front side of the cabinet and is drawn in by the server fans .
Actual images of the front and the rear sides of the cabinet are shown in Fig. 2. Both pictures were taken while the front and the rear door of the cabinet were open. The figures show how the cabinet is populated with the IT equipment as well as wiring and the cable arms attached to the servers.
The experimentally characterized cabinet has five main subsystems: (1) a multifunction rear door with three embedded radial fans with backward curved impellers, (2) heat exchanger box, (3) air recirculation deflector, (4) IT equipment mounting rails, and (5) outer cabinet structure (side channels and door regions). The overall cabinet dimensions are 1.2 m depth, 0.8 m width, and 2.33 m height. The cabinet has a 37 RU available IT equipment capacity. For the experimental study, it was occupied with one 1 RU switch, six 2 RU servers, 15 1 RU servers, and one 9 RU load bank.
In the experiments, the active power generated by each server and the load bank were measured accurately by power distribution units (PDU). Four PDUs were utilized in the cabinet to provide sufficient power for the IT. The PDUs were attached to the local network so that they could be monitored via a web interface. The 1 RU servers generate between 200 and 240 W. The input power is 355 W for the 2 RU servers and 6.2 kW for the load bank. In total, the IT equipment generated about 11.5 kW. The heat dissipation of the cabinet is about half of the maximum cooling capacity. This means the volumetric power dissipation is 5.1 kW/m3. The rear door fans also dissipate heat and the operating power is provided by the vendor (180–270 W depending on speed). The heat dissipated by the fans is included in the cabinet model.
Over 120 temperature sensors (thermocouples) are employed to measure the air temperature distribution within different parts of the cabinet. The measured temperature distribution will be used to characterize the cabinet and validate the computational model. A planar grid of full range air velocity/temperature sensors is located upstream of the heat exchanger in order to measure the average supply air temperature and flow rate that passes through the V-shaped heat exchanger (Fig. 3(a)). In order to capture the turbulent air flow profile at the planar section, nine sensors are placed in the central portion and eight sensors are located at the edges. The AccuSense™ UAS1000  sensors measure air velocity and airflow temperature simultaneously. The velocity sensors are full range hot wires, 0.15–20 m/s (30–4000 fpm) with accuracy of ±5% of reading or ±0.05 m/s (10 fpm) and the temperature sensors are in range of 0–70 °C (32–158 °F) with measurement accuracy of ±1 °C (±1.8 °F) in the temperature reading (Fig. 3(b)). All 17 sensors are attached to ATM2400 data acquisition hub via a USB port. Due to the single flow direction capability of the hot wires anemometers in the sensors, only the air velocity magnitude in the primary flow direction can be measured.
Steady-State Experimental Analysis
V-Shaped Heat Exchanger.
The actual server enclosure employs a single V-shaped, cross-flow fin-tube heat exchanger that is located at the bottom of the cabinet (Fig. 4). Chiller water is supplied to the heat exchanger from an external cooling distribution unit (CDU). A vortex flow meter installed on the supply line measures the supply water flow rate and temperature. In the experimental study, the effectiveness was calculated based on Eq. (1). The heat exchanger effectiveness is the ratio of the actual heat transferred to the maximum heat that could be transferred in a heat exchanger with infinite area. In the case of Cmin = Ca, the equation can be written as Eq. (2). Figure 5 shows that the effectiveness values were measured experimentally in an earlier study . The effectiveness for four air rates and four water flow rates were obtained that were used to develop the heat exchanger model. In general, V-shaped heat exchangers have higher effectiveness (ε) due to their greater surface area
Characterization of the IT Equipment
Theoretical Flow Curves.
The simplified representation of the IT equipment (servers, load bank) includes their external physical dimensions and the air flow behavior under given operating conditions. Impedance curves represent the air flow resistance of the specific equipment, while fan curves characterize the performance of the fans. These curves were determined experimentally for the different IT equipment used in the cabinet. The experimental setup was shown in Fig. 6. The algebraic subtraction of the impedance and fan curve is called the theoretical operational flow curve. Experimentally measured impedance, fan, and flow curves are shown in Fig. 7 for the 2 RU DELL PowerEdge® 2950 server with two hard drives and two power supplies. Negative back pressures with higher air flow rates force more air through the server, while positive back pressure reduces the airflow through the server.
Effective Flow Curves.
The operational flow curve under certain circumstances may not yield an accurate representation of the air flow through the IT equipment . For example, this approach would not capture the recirculation inside the equipment, and would result in an overestimation of the flow in some cases and underestimation in others. The servers were characterized using a flow bench while they were powered and the fans were operating continuously at their maximum speed. Experimental method and validation of this approach is discussed in Ref. . This methodology provides a more accurate representation of the server flow behavior. Figure 8 shows a comparison of the theoretical flow curve with the effective flow curve for the 2 RU DELL PowerEdge® 2950. The theoretical flow curve approach shows about a 50% overestimation in comparison with the effective flow rate approach. The experimentally obtained flow curve is applied at the outflow of the IT equipment in the computational model and this results in a very accurate and simplified compact model. This simplified server model (SSM) provides a significant reduction of the computational time since there is no need to actually compute the flow through the IT equipment in the cabinet model. For the server simulator, both flow curves were determined and the operating points compared. The theoretical flow curve underestimates the operating flow rate due to simpler internal physical structure and higher flow rate with lower pressure drop. Equation (3) provides a relationship between the pressure drop ΔP and the flow rate Q
where Ki and Kv are coefficients that depend on the chassis internal resistance and Pc (critical pressure) represents the highest pressure drop across the chassis corresponding to the minimum server air flow. At the critical pressure, the IT internal components are at the highest reliability risk. Note that Eq. (3) is analogous to the Darcy–Forchheimer equation relating pressure drop to flow rate through porous media. The term Pc represents an additional source within the porous volume (here corresponding to server fans). Generally speaking, this relationship accurately represents the flow behavior of a wide range of IT equipment.
The fluid flow and heat transfer modeling was performed using the 6sigmaroom software package . This software package uses the finite volume method to discretize the Reynolds-averaged Navier–Stokes (RANS) equations combined with the energy equation for the temperature field. The standard k–ε turbulence model is used to treat the turbulent transport . All the cabinet components in the experimental setup such as cabinet structure, rear door fans, and the heat exchanger must be characterized and validated individually. A numerical model can be used for a new design product or make a validated model of a designed product to predict the product behavior at special events. A combination of detailed and lumped modeling must be employed to get an accurate results while saving the computational time.
Rear Door Characterization.
The majority of air flow circulation within the cabinet is generated by three rear door mounted radial fans. As part of the cabinet design, the amount of cabinet air flow is controlled by a thermostat that is placed in the front side of the cabinet. Since the main source of cooled air flow is from the bottom of the cabinet, the sensor is located on the top of cold aisle (front side). Below 20 °C all three fans work at 75% of their maximum speed; above 23 °C, all three fans work at 100% rotation speed . The dimensions of all three fans are the same. The flow curve that was obtained based on manufacturer's manual for 1880 RPM was shown in Fig. 9. A noncontact tachometer was used to measure the fan speed in the cabinet. The fans were measured to run at 1770 RPM for the higher temperate condition (above 23 °C) and at 1470 RPM for the lower temperature (below 20 °C) set point.
Figure 10 shows the rear and the front sides of the rear door numerical model. The air flow from each fan is directed to the inlet of the heat exchanger box via a separate channel. The air exiting the servers is drawn into the center of each fan and then is directed to the corresponding channel. The performance of each of the rear door mounted fans depends on the pressure drop resulting from the flow path. The channels introduce back pressure to the air flow provided by the fans. The air flow of each individual fan was measured experimentally while the rear door was opened. This was done using a flow hood apparatus with an ADM-850 L multimeter. The flow hood was attached to each fan and flow rate was measured while the rear door was open and the fans were exposed to ambient room pressure. The rear door fans and channels were built in 6sigmaroom numerical code (Figs. 10(a) and 10(b)). The experimental measurements to the model using the fan curve data in the representation of the fans were compared. The model is able to predict the flow rate with very good accuracy for both the faster (high temperature) and slower (low temperature) fan speeds. Table 1 shows the agreement between the empirically measured volume air flow rate and the results from the computational model.
V-Shaped Heat Exchanger Modeling.
Two approaches were used to model the actual V-shaped heat exchanger in the numerical model of the cabinet. Both approaches employ compact models of the cabinet V-shaped heat exchanger. The compact model of the heat exchanger must simulate both the heat transfer and total air pressure drop across the heat exchanger. To simulate the heat transfer, a very thin heat exchanger with zero air flow resistance was used. Experimentally measured parameters of the V-shaped heat exchanger were incorporated in the thin heat exchanger. Parameters such as water flow rate, water supply temperature and effectiveness varies based on experimental conditions.
A separate test configuration model was built using the 6sigmaroom software to verify the thermal behavior of the heat exchanger compact model. The model consisted of a rectangular flow channel as shown in Fig. 11 with a uniform air flow inlet upstream (represented by the blower) and an outlet flow boundary condition at the end of the computational domain. The thin heat exchanger model was tested and calibrated as shown in Fig. 11(a). In this test configuration, a rectangular hot plate (perforated plate with zero air flow resistance and specified constant temperature) is used to provide the uniform heat generation to raise the air to a desired constant temperature. Due to the zero air flow resistance of the thin heat exchanger, the air flow at the downstream remains uniform.
The outlet temperatures from the heat exchanger model are compared with the experimentally measured results for three cases in Table 2. For each case, the average inlet temperature is specified on the hot plate, the V-shaped heat exchanger effectiveness (ε) and inlet water temperature are input conditions for the thin flat plate heat exchanger, and the average inlet air velocity is prescribed.
In order to effectively model the pressure drop and flow pattern, two rectangular resistances were employed in the full cabinet model. They had the same dimensions and area of the actual cabinet V-shaped heat exchanger with 46.1 deg angle between them (the same physical geometry as shown in Fig. 4). Parameters such as fin thickness, fin pitch, tube diameter, number of circuits, and rows per circuit govern the pressure drop across the heat exchanger. The mentioned parameters are employed to calculate the air flow resistance across the V-shaped heat exchanger. The details of mathematical procedure are well explained in Ref. . A grid sensitivity test with different mesh density was done on the heat exchanger CFD model. It was proved about 200,000 grid cells are sufficient to get acceptable accuracy of computational results while the model takes less than 5 min to solve. In the full cabinet model, the same thin flat heat exchanger was employed as the only cooling source and it was placed at the outlet of the heat exchanger box since more uniform air flow is expected there.
In the second approach, two flat heat exchangers were employed to account for both the heat transfer and the air pressure drop. It was necessary to adjust the input parameters that must be specified in the software heat exchanger model. For example, the effectiveness is constant but the air flow rate is not divided equally between the heat exchangers. This was also the case for water flow rate. With proper specification of the model heat exchanger parameters, the full heat exchanger model results were in good agreement with the cabinet experimental values, including very uniform outlet temperatures (see Fig. 11(b)).
For the overall cabinet modeling, both approaches for modeling the heat exchanger were implemented. While both models provided good agreement with the experimental measurements, the thin heat exchanger compact model was slightly more accurate and computationally efficient.
In addition to accurately modeling the physical dimensions and geometry of the cabinet and all of the subsystems, proper meshing is required in order to obtain accurate results. The computational grid should adequately resolve all of the regions of high velocity and temperature gradients. The 6sigmaroom software uses nonuniform meshing with better resolution near walls and in regions with complex geometrical structure (e.g., heat exchanger box and recirculation deflector). As mentioned previously, the software uses the finite volume approach to approximate the RANS partial differential equations in order to properly represent the turbulent transport. The regions close to solid surfaces are treated using the standard wall functions. For the computational results presented here, a relatively uniform global mesh was employed to capture the internal cabinet open regions (438,000 grid cells), and localized grid control was employed to capture regions of high gradients. For the most refined calculations, over 5 × 105 grid cells were used. The software utilizes highly efficient equation solvers. Steady-state models with millions of computational cells can be solved in very reasonable amounts of computation time (less than 2 h) on current technology work stations. Figure 12(a) displays the computational model representation of the cabinet generated by the software. An example of the air flow pattern by using the air flow streamlines was shown in Fig. 12(b). The streamlines were colored by air velocity and the range is between 0 m/s and 15 m/s (the cabinet walls are not visible).
The study of the impact of leakage required additional meshing considerations. Leakage regions are typically long and narrow. It is important to adequately mesh these narrow regions. Important sources of leakage are the four sides of the heat exchanger box, the top of the cabinet (cable cutouts), the space between the servers, and the space between rear door and heat exchanger box. Additional refined grids were added to the numerical model for these regions. The effect of leakage will be discussed in detail in Sec. 4.5.
Steady-State Model Validation.
A grid of 17 velocity sensors was used to the measure the air velocities at the inlet of the heat exchanger experimentally, and the measured velocities were used to produce contour plots for different conditions. The air velocities at the same locations from the computational model were used to generate the same type of air speed maps. In Fig. 13, the air speed contours are shown for the higher fan speed (high temperature) and the lower fan speed (low temperature) cases. For each figure, the experimental and the numerical results are presented. For both cases, very good qualitative and quantitative agreement is obtained. The air velocity units are m/s. The black dots represent the location of the velocity sensors.
The experimentally measured average air flow rate is 1.08 m3/s for the higher fan speed case and 0.93 m3/s for the lower fan speed. When these air flow rates were compared to the amount of flow delivered by the three rear door fans, 1.9% error was obtained for the higher fan speed and 2.6% error for the lower fan speed.
One objective of the present study was to validate the numerical model under different input and operational conditions. Table 3 provides a summary of the comparison between the experimental and numerical results. The experimental results are shown in the upper part of the table and the numerical results in the lower part. Effectiveness, average air velocity, and water inlet temperature are the experimental results that are applied as input and boundary conditions for each steady-state numerical case, while the experimentally measured average inlet/outlet temperatures of the heat exchanger are compared with the computational model predictions. The percentage error for each inlet/outlet temperature is shown in Table 3.
Three validation cases were compared in the table. In the first case, the water flow rate is 0.83 kg/s at 16.6 °C, which is the maximum water mass flow rate capacity of the heat exchanger. The total power generation inside the cabinet is 11.5 kW distributed among the different equipment. The average air mass flow rate is 1.23 kg/s and the average air velocity at the inlet of the heat exchanger is 6.3 m/s. The computational model underpredicted the average inlet temperature of the heat exchanger but with only 0.5% error. The average outlet temperature of the heat exchanger, for the same parameters listed above, was underestimated by the model with only a 0.2% error. The water mass flow rate in the second case was decreased to 31% (0.26 kg/s) while the air flow rate is constant. In the third case, the water flow rate is kept constant and the air flow rate is decreased by 16% to 1.06 kg/s. The effectiveness is increased in comparison with the second case due to the reduction of the air flow rate. The heat dissipation within the cabinet was the same in all cases. The computational results show less than about 1% error for both inlet and outlet average temperatures.
Effect of Leakage on IT Equipment Performance.
The numerical model results show good agreement for simulation cases presented in Sec. 4.4. By decreasing the air flow rate, the effectiveness of the heat exchanger increases as previously discussed. Also, the temperature difference across the heat exchanger is increased by decreasing the air flow rate. In an enclosed system, the inlet air of the server has the same temperature as the outlet air of the heat exchanger, and the server outlet temperature is the inlet temperature of the heat exchanger. Therefore, the temperature difference across the heat exchanger is the same as the temperature difference across the IT equipment. Similar to containment solutions used in data centers, an important factor to consider in the operation of a contained system is the possibility of air leakage.
The computational model was used to investigate the effect of leakage. A simulation was performed using the validated model corresponding to the following conditions: The air flow rate was reduced to 0.38 kg/s (equivalent to reducing the speed of the fans to 700 RPM) and the water flow rate is 0.83 kg/s. Since the rear door fans are not providing the IT required air flow, the cabinet is underprovisioned and it causes a significant pressure differential between the front and the rear sides. Leakage may occur in the cabinet on the four sides of the heat exchanger box, between the rear door and the heat exchanger box (see Fig. 14(a)) and between servers (Fig. 14(b)). Also, the cabinet has small amounts of external leakage to the data center environment, such as, the front and rear door seals and on the top on the cabinet via the cable cutouts, Fig. 14(c). In the numerical model, the leakage paths were modeled as open holes with no resistance and the same dimensions as the actual cabinet geometry. The total leakage area on the four sides of the heat exchanger box amounts to 0.0217 m2. This is found to have the dominant effect on the flow behavior in the cabinet.
The blowers do not provide sufficient air flow for the IT equipment. As a result, the IT equipment draw required air through leakages. Figure 15 shows the air flow patterns inside the cabinet from the computational model both with and without leakage. The velocity streamlines are colored by temperature. The cold air provided by the heat exchanger is mixed with the hot air from the leakage path, so the average inlet temperature of the IT has increased significantly. Cold air leaves the outlet of the heat exchanger at 19 °C and it starts mixing with the recirculated air from the leakage path on the sides of the heat exchanger box. The mixing of the air increases at higher elevations in the cabinet; hence, the servers located in the upper part of the cabinet are impacted the most with inlet temperatures in the range 26–30 °C (indicated by the green contours; see figure online for color). From the simulation, the total amount of air leakage in terms of volumetric flow is about 0.152 m3/s. That is about 40% of the air flow provided by the rear door fans. The average inlet temperature of the heat exchanger is 42 °C and the average outlet is 19 °C. The air flow through the other points of leakage is also monitored and was found to be negligible.
The effect of leakage in the cabinet is further investigated by controlling the leakage in the numerical model such that the pressure difference is increased significantly within the cabinet and the air flow rate is decreased. Figure 16 shows the comparison between average server inlet temperature for four cases: 100% provisioning, 40% provisioning with leakage, 40% provisioning without leakage, and 9 °C water inlet temperature. In comparison between the 100% provisioning case and the 40% provisioning with leakage case, the inlet temperatures increase significantly. In the sealed case, the average inlet temperature of the heat exchanger, which is equal to average IT outlet temperature, has decreased to 40 °C while the inlet is still 19 °C. Figure 15(b). This indicates that the outlet air of the heat exchanger is the only air supply to the IT (shown by the blue region; see figure online for color).
A common solution to mitigate hot spots induced by mixing due to leakage is to reduce the water inlet temperature. A simulation was run where the inlet coolant temperature is decreased by 7 °C (from 16.5 °C to 9.5 °C). As a result, the average inlet air temperatures decreased from 28 °C to 21 °C. This result is closer to the model without leakage but is still higher. This seems to indicate that sealing the leakage is a more effective solution than reducing the inlet temperature of the coolant and more energy efficient as well.
Experimental Cooling System Failure Scenarios
Enclosed systems such as the fully enclosed cabinet (studied here) cold or hot aisle containment, etc., are sensitive to different types of cooling system failures. Based on ASHRAE compliance specifications, the air temperature at the equipment inlets must be in the recommended range, which depends on the class of the IT equipment (A1–A4). In this part, two important cooling system component failures on the IT performance will be investigated experimentally.
Component Thermal Inertia.
During the experimental process, the servers were powered off; hence, the inlet and outlet air temperature were the same initially. The air flow was controlled and measured by the flow bench and kept constant during the experiments. The inlet air temperature gradually increased that was measured through a grid of nine thermocouples and the average outlet was monitored by six air temperature/velocity sensors. The server was covered in order to prevent convective heat transfer through the server walls as it was shown in Fig. 6. In Fig. 17 can be seen during heating process the server thermal mass stores heat and this can be concluded from inlet/outlet temperature difference. More details about the experimental and numerical procedure are given in Ref. .
The average thermal inertia of the V-shaped heat exchanger was calculated using this methodology as well. A typical cooling coil consists of copper, aluminum, steel, and trapped water. The material mass and the volume of contained water are provided by the Emerson thermal labs. The overall cabinet structure is the other main source of thermal mass in the cabinet, which is also calculated based on the material and dimensions of the parts. The cabinet frame, mounting rails, server rails, rear door structure, controller box, and rear door fans are the most important parts of the cabinet structure whose thermal mass plays an important role for transient analysis.
Water Pump Failure.
In data centers, the chilled water supplied to the heat exchanger is controlled by cooling distribution units (CDUs). Each CDU can serve multiple cabinets with supply and return manifolds. The water flow rate to each cabinet heat exchanger based on the required cooling power is controlled locally by a three-way valve. Cooling distribution units typically contain two circulating water pumps for redundancy (a primary and a secondary pump). The CDU control system determines which pump is operational and detects pump failures .
To simulate pump failure experimentally, the control valve installed on the return pipe of the heat exchanger was closed and the air temperature in the cabinet was monitored. In addition to the air temperature, the CPU core temperatures were monitored to determine the time it takes for the CPU cores to exceed the recommended limit. The inlet air temperature of the servers was monitored as well. The rear door fans rotational speed were fixed during this failure scenario but the overall circulating air flow varies since the server fans speed vary based on the local temperatures (via an internal control system). The intelligent platform management interface (IPMI) reads the inlet temperature continuously while the server is powered. Depending on the server model, the IPMI can record core, memory, air inlet temperatures, and server fan speed.
In these experiments, all of the sensors described previously were utilized. The system operates normally for the first 350 s to check the steady-state operational in both the air and water sides of the heat exchanger. At 350 s, the water flow to the heat exchanger is turned off. The data show that the air outlet temperature increase earlier (Fig. 18). The inlet air temperatures from all 17 sensors are included in the figure using different color lines in order to display their difference during the failure. The temperature sensors can be categorized into three different sets (left, middle, and right). The middle group of sensors mainly reports the server simulator's outlet temperature and due to the relatively lower thermal mass and higher air flow of the server simulator in comparison to the servers, the air temperatures increase at higher rate in this region. The sensors in the left and right regions have slower rate of temperature increase during the failure. The blue line pertains to the average air temperature downstream of the heat exchanger (see figure online for color). All nine thermocouples recorded a similar pattern so only the average was included. The heating rate is 0.07 °C/s while the cooling rate is about 0.095 °C/s, which is due to the thermal inertia of the heat exchanger. Higher thermal mass causes lower heating and cooling rates. The water supply temperature is represented by the green line in the figure (see figure online for color). During the failure scenario, the water is trapped in the supply pipe and this is why it remains constant. In the recovery mode, the trapped water in the heat exchanger pipes starts circulating in the CDU loop. Since it takes time for the CDU control system to adjust the chilled water three-way valve to maintain the supply water set point, a fluctuation was observed during the recovery mode that takes more than 16 min.
In Fig. 19, the heat exchanger effectiveness values (calculated based on Eq. (2)) are shown as a function of time. The initial value is ε = 0.81 and the value decreases to zero. The heat exchanger thermal inertia delays the air inlet and outlet temperature equilibration. This was shown in Fig. 18 as well. The water mass flow rate was 0.75 kg/s before and after the failure period (shown by green line; see figure online for color). The inlet temperature of the three one RU servers (top, middle, and bottom) was shown during the failure in Fig. 19. The inlet air temperatures differ slightly initially but this difference is amplified during the failure period. Server 15 (upper server) experienced the highest inlet temperature. It took 360 s for the first CPU core to exceed the recommended limit of 104 °C for these servers. On the other hand, it only took 130 s for the inlet temperature of the first server to exceed the recommended range based on the ASHRAE standard. Note that the control system of the cabinet opens both rear and front doors if the server's inlet temperature exceeds 35 °C. For this experiment, the doors were kept closed manually.
Another common failure in data centers is the cooling unit blower failure. For the enclosed cabinet, one or more of the rear fans might fail and this would lead to an underprovisioned situation. This leads the equipment to draw their required air flow from alternate paths such as regions with possible leakage as discussed in the earlier leakage section. When the rear fans are powered off, the only source of air movement is the server and server simulator fans. This means the air circulation within the cabinet is minimized but it is not completely blocked. In Fig. 20, the air temperature at upstream of the heat exchanger is shown with different color lines for each sensor. Similar to the other failure scenario, the middle region sensors show slightly different behavior in comparison with the left and the right region sensors. In the middle region, the temperature is reduced slightly and then gradually increases. During the recovery, it displays a sudden increase. The heat exchanger outlet air temperature decreases during the failure. This is due to significant reduction of the air flow that increases the effectiveness of the coil. The server simulator is on top of the heat exchanger and it draws the majority of the outlet air flow. The average water inlet temperature has very minimal fluctuation of ±0.8 °C since the water flow rate is constant and during the failure most of the heat is absorbed by the thermal mass.
In this failure scenario, the effectiveness is again ε = 0.81 similar to the water pump failure scenario since the initial condition of both cases was the same (Fig. 21). However, in this case, the effectiveness increases during the failure since the air flow is reduced and the water flow rate is constant. The effectiveness increases up to ε = 0.94, which is a relatively high value. The green line (see figure online for color) indicates the average air mass flow rate in the cabinet, which is entirely driven by the server's fans. It is initially at 1.23 kg/s and after the failure it is reduced to 0.25 kg/s. Then the server's fan RPM increases since their processor's temperature increases and the average air mass flow increase up to 0.42 kg/s. After the recovery, the average is still higher in comparison with the prefailure situation for the same reason.
The most important point of this section is shown in Fig. 21. The inlet air temperatures of the same servers (as in the other failure scenario) do not exceed the recommended ASHRAE compliance limit during the failure while their CPU core temperatures exceed 104 °C. In the blower failure, the time duration for the first CPU to exceed 104 °C is 255 s which is 105 s shorter than the other failure situation. However, the inlet temperature is not a reliable indication in this case. The IT has similar behavior in any enclosed environment such as cold aisle containment. The limitation of air flow in enclosed systems independent of the air temperature can result in server failure. The cabinet control system keeps the doors closed for the first 5 min of the blower failure. Bear in mind, in the water pump failure scenario, it only takes 2 min for the doors be released.
In this study, the thermal performance of a hybrid water-cooled, fully enclosed server cabinet was characterized experimentally and analyzed via numerical simulations. A comprehensive experimentally validated CFD model was developed using the commercial 6sigmaroom software to investigate air flow and heat transfer inside the cabinet. A detailed description was given of the experimental setup, the arrangement of the IT equipment, and the location of the temperature/velocity sensors. In the test facility, three different types of IT equipment were utilized. Each was characterized using a flow bench and flow curves were obtained. The effective flow curves were used to represent the server flow characteristics in the comprehensive numerical cabinet model. While the characterization of the air side of the cabinet's cooling loop proved to be a complicated task, it was found to be very important in order for the computational model to be robust and accurate.
A combination of compact and detailed modeling was employed in the cabinet computational model. The rear door fans and channels of the cabinet were experimentally and numerically characterized. The characterization study provided detailed information on how the air flow channel design within the door affects the performance of the rear door fans. The V-shaped heat exchanger in the cabinet was modeled in two ways that both cases shown good agreement with experimental results.
Three validated cases were investigated for different air side flow conditions, with less than 2% total error achieved for all the cases. It was determined that air leakage played a significant role when system is operating underprovisioned, since hot air recirculation can occur. Eliminating the leakage led to a marginal decrease in the IT inlet air temperature since mixing of hot and cold air is minimized, even more effective when compared to a reduction in the coolant temperature. The results from the study show how the mixing of outlet and inlet air is highly undesirable in enclosed cabinets just as it is in room level cooling.
Two cooling system component failure scenarios were performed experimentally. In order to analyze the transient experimental results, the thermal mass of the servers was determined using a semi-empirical methodology, and the thermal inertia of the cabinet subsystems and the V-shaped heat exchanger were calculated based on the mass and heat capacities of the component materials. The effect of the water and the air side failure of the heat exchanger on the performance of the cabinet were investigated. The water pump failure scenario resulted in increased air temperatures at the inlet of the servers and increased CPU temperatures. However, on the air side (blower) failure, the IT inlet temperatures were within the ASHRAE recommended range while the server's CPU temperature exceeded their limit in 250 s. An important conclusion from the study is that monitoring only the IT inlet temperatures, especially for the blower failure, in contained systems can be misleading.
- Cp =
specific heat capacity,
- CAC =
cold aisle containment
- CDU =
coolant distribution unit
- CFM =
cubic feet per minute
- CRAC =
computer room air conditioning
- h =
heat transfer coefficient,
- HAC =
hot aisle containment
- LPM =
liter per minute
- Pc =
critical pressure, PSI
- PDU =
power distribution unit
- Q =
air flow rate, kg/s
- Q =
air flow rate, m3/s
- T =
- Tai =
air inlet temperature
- Tao =
air outlet temperature
- Twi =
water inlet temperature
- Two =
water outlet temperature
- ε =
heat exchanger effectiveness
- λ =
location of power dissipation relative to mass