This article explores various innovative ways that technology companies are working upon to deal with the large amount of heat of their big data centers. Microsoft’s research and development team is working on designing and building of an underwater data center. Researchers believe that underwater data centers might have power, construction, and performance advantages as well. Berkeley’s National Energy Research Scientific Computing Center (NERSC) recycles its chilled water. NERSC uses air to chill the data center and cooler-running components, such as disk drives, routers, and network servers. With the growing efficiency of data centers, better cooling might not be enough to make underwater operations worthwhile. But cooling is only one of the advantages that Microsoft sees in submerging its data centers. Submersibles also simplify deployment. Microsoft used an underwater cable connected to the electrical grid to power Leona Philpot, a submersible data center. In the future, it may reduce costs through renewable energy, combined with on-site energy storage and backup power from the grid.
Computer scientist Ben Cutler had spent four years at DARPA, the Defense Advanced Research Projects Agency, piloting large programs to develop military battlefield software. Now he was ready to return home to Seattle.
The problem was, he needed a job. So he started reaching out to people he knew in the Seattle area, including Norm Whitaker, a former DARPA deputy director. Microsoft recruited Whitaker from DARPA six months earlier to head up Microsoft Research's Special Projects group, which was dedicated to high risk, high payoff projects.
Whitaker, it turned out, had a job for him. He asked Cutler if he wanted to design and build an underwater data center.
“My first reaction was, ‘I’m not going to Microsoft if this is what they’re doing,”’ Cutler said. “I think I had the same reaction anybody does—it doesn’t make any sense.”
Data centers provide computing power for everything from corporate networks and websites to streaming video and smartphone apps. Submersing racks of electronics in water, where small problems would go unfixed and a single leak could take down the entire system, is not something most engineers would readily embrace.
Cutler rejected the underwater data center out of hand.
Yet he couldn’t get the proposal out of his mind. As he worked through the physics and science, he realized it was not as crazy as it sounded.
For all the downsides of locating a data center underwater, there was one decided advantage. Cold ocean water could provide a cheap source of cooling.
Heat is the enemy of large data centers, which draw and then must shed power by the megawatt. Most data centers spend a significant share of their electrical budget on refrigeration, to keep their circuits from cooking. And increasingly, the companies that run data centers are looking for any advantage they can in reducing the cooling costs. Some are building data centers where electricity is cheap and the air is dry and cool. Others have replaced refrigerators with giant cooling towers.
The ocean, however, is an ideal heat sink. Nuclear submarines rely on ocean water to cool heat exchangers used to chill coolant for their nuclear reactors. And submarines have an excellent record of keeping sensitive instruments dry and happy.
In fact, one of the Microsoft employees who first proposed the concept to Whitaker had served aboard a submarine.
Underwater data centers might have power, construction, and performance advantages as well.
Cutler called Whitaker back and they talked some more. In July 2014, Cutler joined Microsoft. One week later, he was leading Project Natick, Microsoft's effort to build a data center that could operate under the sea.
By the next summer, his team was testing out the concept off the coast of California.
Running Full Tilt
Corporations have had data centers for decades, but the Internet and cloud computing sparked their rapid growth.
“Today, everything that we think would normally run on a PC or cellphone is no longer there,” said Brent Draney, group lead for networking and security at National Energy Research Scientific Computing Center in Berkeley, Calif. “Those apps and streaming media are running in a data center somewhere else. PC's and phones are really just data movers.”
Data center companies of all types struggled to keep up with demand as business took off at the beginning of the decade. Every other priority took a back seat to adding capacity and running it full tilt, 24/7. Anything less and they might lose users—and profits—to faster, more nimble competitors.
That narrow focus could be seen in something as basic as cooling. Most data centers spend a small fortune on chillers, the ton-sized air conditioning units that chill air to cool rack after rack of stripped down computers called servers. There are ways to economize on this—entraining the cold air directly to the racks, say, or venting the air after it has heated—but instead, many operators let hot and cold air mix, and ran their air conditioners overtime to cool their data center. Data center managers also kept rooms too cold, ran fans too fast, and failed to use their servers’ power management features. Nearly half of large companies did not benchmark power use.
In fact, they struggled even to measure electrical efficiency. The industry's metric of choice was something called Power Use Effectiveness, which is total electrical consumption divided by electricity used for computing. According to a 2012 survey by the Uptime Institute, an IT professional organization, the industry's average PUE was 1.8 to 1.9, meaning only about half of the power draw was used for running servers.
But even some of those servers were not being used productively. Most of the time, they were waiting for something to do. Data center managers also kept older, slower units running as backups. Other servers continued to run, even though no one knew what they were being used for. For PUE calculations, power spent running those underused servers counted as computing. An accurate PUE that excluded underutilized servers might have been 3.0 or a less efficient rating.
No one really knew, and as long as the servers kept running, no one really cared.
Today, data center electricity use has been slashed. Servers at the best performers are running at full capacity, and there are no slow units to be found. Google's average PUE is a remarkably low 1.12 and its competitors in the data center business are in the same ballpark. Their costs are so low, they can sell cloud computing services to less efficient IT operations. How did they do it?
One place to study data server best practices is the new 149,000-square-foot facility at Berkeley's National Energy Research Scientific Computing Center. The facility has a lot in common with large data centers run by the best cloud providers. Both consume more than 5 MW of power and use similar technology and similar strategies to manage power and heat.
While most businesses consider data center technology a closely guarded secret, NERSC's Draney is happy to talk about what makes his testbed one of the world's most efficient data centers.
For instance, just like today's best conventional data centers, NERSC runs its large supercomputers as virtual machines. That means that in spite of having hundreds of thousands of processing cores, the supercomputers act as a single unit that can be subdivided into multiple smaller “virtual” computers to run multiple jobs simultaneously. This not only improves utilization, it eliminates the need to run “just in case” vampires.
“Today, everything that we think would normally run on a PC or cellphone is no longer there. Those apps and streaming media are running in a data center somewhere else. PC's and phones are really just data movers.”
— Brent Draney, Group lead for networking and security National Energy Research Scientific Computing Center, Berkeley, Calif.
To further improve performance, designers packed the processing cores much closer together than in a conventional server. The racks of NERSC's large Cori supercomputer jam two nodes, each with 16 cores, onto a single circuit board. Each rack holds 48 of those circuit boards, and needs two 100-Amp, 480 V feeds for power, Draney said.
The configuration of Cori's second phase will be even denser, with 68 cores per node. Overall, each rack consumes between 65 and 75 kW. That's four to five times more power than a conventional server rack.
That density improves performance—the shorter the path between the cores, the faster they can communicate with one another—but it also increases the amount of waste heat to be bled off. In fact, it's too much heat for air cooling, so NERSC turned to liquids. Water removes heat 1,000 times better than air. Cori uses what Draney calls “nearly liquid cooling.” Today, water-cooled heat exchangers chill air before blowing it across the rack. The next phase will use direct liquid cooling, running water through a cold plate on top of the cores, Draney said.
NERSC recycles its chilled water. Most data centers would do that by running the water through a heat exchanger cooled by a chiller, but NERSC does it by tapping one of Berkeley's great natural resources—the cool air coming off San Francisco Bay. “We have the world's greatest cooling system,” Draney said.
His data center uses that air to naturally chill a cooling tower to remove heat prior to recycling. NERSC also uses air to chill the data center and cooler-running components, such as disk drives, routers, and network servers. By using a combination of cooling towers, inlets, fans, and baffles, Draney can keep building temperature and humidity within the narrow range of conditions best suited for electronics.
Cooling towers are less expensive and far less prone to breakdown than chillers, and they cost less to operate. And water is easier to pipe in and out than air. Those savings make the building less expensive than a similar sized conventional data center.
Most commercial data centers do not have NERSC's enormous heat loads, but they have begun to use similar strategies. Some open their windows to cold desert air at night. Several have located to cooler Scandinavian countries, where they draw cooling water from fjords. One sits deep underground, in an abandoned mine. In Belgium, Google went the other way, and operates at temperatures so high, technicians cannot approach the servers for prolonged periods of time.
And everyone is looking at advanced water cooling.
One driver for Project Natick's underwater data centers is access to a great heat sink. “The ocean,” as Ben Cutler noted, “is a very cold place.”
Cutler's team deployed Leona Philpot, Project Natick's first submersible data center, in the cold waters about a mile off the coast of California in 2015. The 10-foot-long by 7-foot-wide capsule weighed in at 38,000 pounds. It carried 300 servers on racks designed to rock gently in the current.
For 105 days, the Leona Philpot sat 30 feet below California's ocean surface collecting data from dozens of sensors that measured pressure, humidity, motion, and temperature. Microsoft also installed cameras so it could observe the interior.
Leona Philpot used simple heat exchangers and closed loop air cooling to bleed off heat. Operating on the sea floor makes it cheap and easy to run data centers at very low temperatures, which lowers the failure rate of its electronics.
With the growing efficiency of data centers, better cooling might not be enough to make underwater operations worthwhile. But cooling is only one of the advantages that Microsoft sees in submerging its data centers. Submersibles also simplify deployment. Today, a data center project takes two years to complete, Cutler explained: In addition to buying land and building and commissioning a facility, companies must line up electrical power, file environmental permits, and work through tax laws. Even finding a workforce is sometimes difficult in the remote locations where land and power are cheapest.
Submersibles, on the other hand, are prefabricated. Before shipping, technicians have checked every server, wire, and connection. Not only is the ocean a very consistent environment, but once the container is sealed, it is impervious to airborne contaminants and fluctuations in temperature and humidity.
Of course, no one will pay house calls to fix problems either. Microsoft reduces the risk of failure or fire by running cold and substituting nitrogen for oxygen inside the container. Still, equipment will fail.
“Ordinarily, we’d have people running around and fixing things,” Cutler said. “Here, we won’t do that. We’ll let it fail. We accept that capacity will decline over time. And in return, we won’t have to pay for people, repairs, parking spaces, or lighting. If you analyze maintenance costs at a granular level, it pays to let some servers fail in place. We can pull the capsule back up to upgrade to next-generation servers every five years.”
“Half the world's population lives within 120 miles of the ocean. Putting a data center offshore near population centers reduces the latency and improves the customer experience.”
— Ben Cutler, Microsoft
An offshore location also opens up some interesting possibilities for power. Microsoft used an underwater cable connected to the electrical grid to power Leona Philpot. In the future, it may reduce costs through renewable energy, combined with on-site energy storage and backup power from the grid.
“We could put propellers in the water and tap tides or currents such as the Florida Gulf Stream, which is 100 kilometers across and moves about 1 meter per second,” Cutler mused. “Or we could take advantage of motion from waves, though that is more difficult.”
There is one more reason to look offshore, and it is perhaps the most important of all. It is the same reason that has driven data center development since the earliest days of the Internet—performance.
“Ordinarily, we’d have people running around and fixing things. Here, we won’t do that. We’ll let it fail. ... We can pull the capsule back up to upgrade to next-generation servers every five years.”
— Ben Cutler, Microsoft
When companies site data centers where land and power are cheap, they are often far from their customers, who more likely than not live in cities and suburbs. That distance creates what is known as latency, a time gap between requests and their fulfillment. Much the way bunching processing cores together improves computer performance, bringing data centers closer to customers reduces latency.
“Half the world's population lives within 120 miles of the ocean,” Cutler said. “Putting a data center offshore near population centers reduces the latency and improves the customer experience, whether it's playing a game, watching a video, or extracting files for Microsoft Office.”
Leona Philpot was a small step in seeing whether the submersible reality lived up to Cutler's calculations. And Microsoft got some experience working with the marine industry, something new for a land-locked company that has solid ties to academic researchers but not old-line shipbuilders.
“It was the marrying of two mature industries, IT and marine,” Cutler said. “It is flip to say that there was nothing really hard about it, nothing that hadn’t been done before. But it's a big ocean, and we had some big concerns—could we keep stuff dry? Operate remotely? Keep it cool?”
Rather than try to optimize a design, Microsoft simply wanted to prove that the underwater concept would work, and that it would not run into unanticipated problems.
The most surprising element of the Leona Philpot experiment was that there were no real surprises. Everything seemed to work. As a result, Microsoft is moving forward on a larger deployment.
Someday, perhaps, instead of thinking about putting data in the cloud, we’ll talk about it living in the ocean. And it won’t be a just metaphor.