A guest post by Bertrand Sirodot Traditionally, servers have been cooled by sucking cold air on one end of the server and expelling it at the other end. Datacenters have […]
A guest post by Bertrand Sirodot
Traditionally, servers have been cooled by sucking cold air on one end of the server and expelling it at the other end. Datacenters have been designed to optimize this method of cooling by creating cold isles and hot isles, thus ensuring the coldest air goes on the intake of the server, while the hot air at the other end gets properly recycled.
Unfortunately, this method of cooling servers is fundamentally flawed, even though a lot of R&D dollars are spent optimizing air flow and the amount of air being moved. Why is it flawed? Because the thermal conductivity of air isn’t that great, meaning that, as a medium, air isn’t at removing heat. As human beings, we know this intuitively. When the weather gets hot, we will use fan to help cool us down, but we also know that there is a point where fans aren’t enough and we usually turn to something far more effective: water. We do this by throwing water on ourselves, which cools us a lot more than just fans. Why? Because the thermal conductivity of water is about 4x the one of air, meaning that water is able to absorb a greater quantity of heat. For instance, according to the Engineering Toolbox, at 25° Celsius (77°F), air has a thermal conductivity of 0.0262, whereas the thermal conductivity of water is 0.606. This means that, at 25° Celsius, water can absorb 25x more heat that air.
Moving air to cool servers has worked well for us so far, because the amount of heat needed to be dissipated was within the realm of what air could do, but we know that our servers are getting hotter and hotter. The gaming industry has known that fact for a few years, where most of the high-end gaming computers have been liquid cooled for at least the last 5 years. The components used in high-end gaming computers are not that different from the components used in PowerEdge servers: similar Intel CPUs and NVIDIA GPUs. If anything, the issue is greater in servers, because they can house multiple CPUs and multiple GPUs, which is why with the new generation of Intel-powered PowerEdge servers, Dell is now supporting liquid cooling, because it is the most optimal way of removing heat from those servers.
“But wait, how do I retrofit my datacenter with liquid cooling?” I can hear you ask. Great question, which we will dive into for the rest of this blog post.
One thing worth noting is that having liquid cooling in a datacenter doesn’t remove the need for air cooling as the heat gathered through the liquid cooling process still needs to be remove from the datacenter. So why even bother with implementing liquid cooling then? Because over what I mentioned above.
As with air cooled datacenters, the components required to implement liquid cooling in a datacenter are fairly similar, ie:
- The coolant distribution unit (CDU),
- The manifold,
- The cold plate.
Let’s dive into what each does:
- The CDU: like the air conditioners in an air cooled datacenters taking hot air and cooling it down, the CDU move the coolant through the system so it can capture the heat from the CPU and move it away from the CPU. CDU will have 2 loops: a cold loop sending cold coolant liquid (water) to the servers and a hot loop moving hot coolant back to the distribution unit.
- The manifold: The manifolds are part of the rack and there are typically 2 manifolds per rack, one for the cold loop and one for the hot loop. Manifolds are responsible for the transfer of coolant to and from the servers and are connected directly to the pipes exiting the servers. Below is a picture of a rack manifold:
- The cold plate: The cold plate replaces the traditional heat sink on the component being cooled. The cold plate is responsible for the heat exchange between the component and the coolant. It takes heat from the component and transfers it to the coolant. Depending on the need, cold plates can be attached to processors, memory DIMMs or in some cases, GPUs.
Below are some examples for cold plates:
- The cold plate for the Intel Ice Lake Xeon Processor:
- The cold plate for memory DIMMS:
Dell recognizes that most customers won’t implement liquid cooling throughout their datacenter, but instead will target specific workloads to ensure best possible performance for those workloads. To that end, Dell has built a modular approach to liquid cooling, centered around the concept of a pod. A pod is made of 3 racks: 2 42U or 48U 750mm wide compute racks for servers and 1 600mm cooling rack. The cooling rack is connected to the cold and hot loops coming out of the CDU, as per below:
A single CDU can drive up to 5 pods or 10 compute racks as shown in the diagram below:
This modular design allows customers to focus liquid cooling where it matters the most, where it would have the biggest impact and to slowly introduce this technology into their datacenter.