A guest post by Bertrand Sirodot As I have discussed in my previous blog post on liquid cooling, cooling is going to be a significant topic over the next few […]
A guest post by Bertrand Sirodot
As I have discussed in my previous blog post on liquid cooling, cooling is going to be a significant topic over the next few years. I found the diagram below to be a great way to visualize why that is:
This shows the Thermal Design Power (TDP) for the next generations of processors from Intel and AMD. As shown by the graph, over the next 3 years, most of the latest generation processors will have TDPs over 400W.
Taken in isolation, this value is impressive, but, when combined with the ever-increasing power requirements of GPUs, the cooling requirements of NVMe drives and the advent of Data Processing Units (DPUs), it explains why we are at an inflection point from a cooling perspective.
The diagram below put those TDP values against the cooling capabilities of existing technologies (air cooling and liquid cooling):
One thing to note on the diagram is that the tipping point between air cooling and liquid cooling depends on the server form factor (such as 1U vs 2U and multinode vs monolithic) and its configuration, but generally speaking, air cooling can be stretched to about 350W but not much further.
Looking at server cooling technologies, they can be combined in 3 separate categories:
- Air cooling, which is called Multi-Vector Cooling on Dell PowerEdge servers
- Liquid cooling, which is called Direct Liquid Cooling on Dell PowerEdge servers
- Immersion cooling
Let’s quickly dive into each category:
- Multi-Vector Cooling: this is the traditional cooling method where very powerful fans intake cold air through the front of the server and exhaust hot air through the back of the server. This method works in combination with proper datacenter designs, with cold aisles, where cold air coming from chillers is moved to so it can be sucked in by the servers, and hot aisles which gather the hot air coming out of servers and send it to the chillers. Lots of innovation has been put into maximizing the air flow within the server itself, from the shape of the openings in the front bezel to shroud within the server to move the cold air where it is needed the most. I am also seeing new types of rack coming out with in-row coolers and/or with heat exchanger, ie chillers, built directly into the rear doors of the servers. Those innovations have pushed the envelope of air cooling.
- Direct Liquid Cooling: as described in my previous post, this method has been around for a while, mostly in the gaming industry, but is just now starting to trickle into the datacenter. At its heart, it is using liquid to move heat away from the hot component, ie CPU, and into a heat exchanger where the liquid is going to be cooled again. This can cool hotter components because of the heat absorption factor of the liquid used, which is significantly higher than air. Currently liquid cooling is used to cool specific targeted components, such as a CPU or GPU, and still rely on fans to cool the rest of the server. Current Direct Liquid Cooling technology can support components with TDPs up to 900W.
- Immersion cooling: Immersion cooling is a relatively technology and means dumping a server into a large container full of fluid. The idea behind immersion cooling is that the liquid will absorb the heat from the server and then get cycled through a heat exchanger, so the liquid gets cooled again and pumped back into the container. There is active research currently on the best liquid to achieve the largest heat absorption.
As shown in the graph below, Direct Liquid Cooling offers the highest cooling capabilities of all 3
Immersion cooling, despite its promises, is today comparable with air cooling, as it supports components with TDPs up to 400W for single phase immersion cooling and up to 650W for dual phase immersion cooling. The other downside of Immersion Cooling is the serviceability of the servers as they are installed into a container full of fluid.
Since my previous post, Dell has come out with this technology called Smart Cooling for its PowerEdge servers. I can already hear you wonder about the lengthy pre-ample, as you probably came here to learn about Smart Cooling, right? Let’s dive into that then.
At its heart, Smart Cooling is very simple: it is the ability for all future (and some current generation) of PowerEdge servers to support all 3 cooling methods, thus giving customers the choice of the appropriate method based on their workload and their datacenter capabilities. Dell feels that, for the foreseeable future, customers will have workloads that are perfectly fine being air cooled and workloads that will require the cooling power of Direct Liquid Cooling, hence the support for all 3 cooling technologies.
But Smart Cooling doesn’t stop at the support of cooling technologies, it also includes a management component called Power Manager, which helps customers by gaining visibility into their power usage, into who is consuming what and by avoid power or thermal related downtime. Power Manager also helps datacenter managers with their carbon-footprint reduction initiative by showing under-utilized servers and by decreasing overall power consumption.
I am convinced that, within the next 5 years, most datacenters will leverage at least 2 different cooling technologies and, as I discussed in my post on Liquid Cooling, that transition will not necessarily be easy. Knowing that the hardware they buy today will be able to transition to new cooling technologies should be at the forefront of any hardware acquisition decision, as this is happening, and it is going to be chilling! 😀