Liquid Cooled Servers - IT Meets Facilities

With the continued march of IT computing demands to higher and higher densities, chip manufacturers have been trying to walk the tight-rope of improving performance vs maintaining the ability to air-cool their chips. Why? Because so many end users are not ready to take the leap into liquid-cooled IT. However, the push for more compute has resulted in an ever increasing TDP (thermal design power) which has pushed us as an industry past the point of being able to get best-in-class computing performance while staying air cooled. For more information on this, I recommend you read up on the Intel Cascade Lake processors released earlier this year.

But whether or not to adopt liquid cooling is NOT the point of this article - Once you accept today's reality that we are heading to liquid cooling for best-in-class IT performance, there is another Pandora's box to consider: IT manufacturers have thermal design teams who are working to design systems that can cool the IT hardware, and then must interface with the facility systems. Sounds easy, right? Well, not so much. Let us dig into it a bit:

The chipsets in the servers have a critical case temperature. As long as the case temperature is at or below the critical value, life is good. With air-cooled IT, that case temperature is a function of the server entering air temperature, the layout of the hardware on the circuit boards, how ancillary components affect the airflow through the case, and how the air temperature rises as it goes from the inlet of the server to the chips. Once it GETS to the heatsink on the chips, it is hotter than it was when it entered the server. Additionally, there is a temperature gradient across the heatsink, and the heatsink average temperature is much higher than the air passing over it, but lower than the chip's case temperature. It is not uncommon to see case temperatures in high performance air cooled applications up around 80-90C (176F - 194F).

An important aside here: ASHRAE considers IT racks with rear door heat exchangers (RDHX) to be liquid cooled, even though the IT hardware itself is still cooled by air. The RDHX really serves to bring the rack's exhaust air to room ambient. The majority of what follows focuses on Cold Plate or Direct-To-Chip liquid cooling, where a liquid cooled heat sink replaces the air-cooled heatsink on the chips within the server.

Because of the difference in heat capacity and thermal conductivity between air and liquids (I am using "liquids" here instead of water because the principle applies to water, aqueous solutions, mineral oils, two-phase refrigerant immersion, etc), the chip's case temperature is MUCH closer to that of the LIQUID than would be to air. Great, the chips can run cooler, and they typically do. There are pros and cons down this rabbit-hole (operating case temperature) as well, but I'm trying not to digress too far.

BUT, because the fluid passages (channels) in the liquid-cooled heatsinks (I am ignoring immersion here - on purpose) have to be so small to provide sufficient heat exchanger surface area to remove the heat flux generated at the chip, the fluid velocity has to be to lowered to avoid generating differential pressure AND working pressure problems. What did we learn in thermo? P = Q*dT*(pick your unit conversion). As flow rate (Q) drops, dT goes UP.

This is where things start to get interesting for the HVAC design engineers and facilities folks. Going forward, for simplicity, I'm going to reference water as the working fluid, but the premise is the same regardless. Right now, the trend in water cooled IT is to continually increase the dT of the working fluid within the server and its factory-engineered, close-coupled environment. This is both for energy efficiency, but also to keep pressures low. The wall thicknesses in the chip heat exchangers are, by necessity, VERY thin and cannot handle much gauge pressure - think 15 psi maximum working pressure at the heat sink, and you start to understand the challenge. These dT's can start as low as 10C (18F) and go up higher than 20C (36F). The CDU's that separate the TCS loop (Technology Cooling System - close-coupled, high purity coolant loop in the IT) from the FWS loop (Facility Water System) tend to have similar dT's on both the primary and secondary sides. Why? One reason is because it is easier and more cost-effective to make heat exchangers with comparable dTs on each side. If this is done at a small fraction of racks, or as part of a much larger chilled water system, this is not a problem. However, for dedicated facilities with high ratios of water cooled IT, this can be very problematic. Chillers are simply not designed to operate at dTs greater than 10C (18F). There are a number of possible solutions, including isolation heat exchangers, primary-secondary systems, etc. But variable-primary-flow systems cannot do it.

Once you solve the dT issue, there is also the trick of the design temperatures. The design temperatures pose both opportunities, and challenges. Liquid cooled IT designed to the W2 class requirements can operate at full design power with entering water temperatures at 25C (77F). W3 equipment can accept entering water temperatures of 32C (90F). Not exactly comfort-cooling chilled water temperatures. In fact, liquid cooled IT systems typically have stringent requirements for ambient air dewpoint temperatures relative to the supply water temperatures. This is because the internal plumbing in most liquid cooled IT hardware IS NOT INSULATED. Different IT manufacturers have different means of addressing dewpoint and water temperatures, which further complicates matters. But bringing 6C (43F) chilled water to a rack is not a recipe for success.

The upside to higher water temperatures is you have greatly increased opportunities for water-side economization! Selecting chillers for W2 class system deployments, it is not uncommon to see full load efficiencies under 0.375 kW/ton and IPLVs under 0.200 kW/ton, and with fully integrated economizers, realizing over 4000 hours per year of economization is not hard, even in hot, humid climates. Additionally, it is absolutely realistic to expect to see significant run hours with no compressorized cooling online. However, that too presents challenges. How do you design the economizer, where is it integrated into the system, and how do you manage ultra low lifts (and sometimes NEGATIVE lifts) on the chillers?

Lastly, and honestly this was the driver to write this article - at present, though there is an industry standard for design temperature classes for liquid cooled IT (refer to ASHRAE Liquid Cooled Thermal Guidelines, W1-W5 class), HOW the thermal engineers for the different IT manufacturers are approaching the thermodynamics AND the interface to the facility cooling plants varies greatly. While I continue to work with my colleagues within ASHRAE TC9.9 on white papers and design guidelines for these systems, I believe it is fair to say that, today, integrating multiple IT manufacturers and even integrating both conventional air-cooled IT or RDHX with cold plate liquid cooled IT on the same cooling plant poses a lot of challenges. These challenges are by no means insurmountable, and as an engineering geek I enjoy the challenge of figuring out how best to integrated these dissimilar systems. But the current reality is our industry is embarking on a new era of identifying, understanding, and working around liquid cooling design compromises that must made while we concurrently work to better define best practices for BOTH Designers AND Manufacturers.

NOTE: The comments above are my own and do not necessarily reflect the thoughts or positions of ASHRAE TC9.9 or its members.


Don Mitchell

Mission Critical Division Leader, Victaulic

4y

John - Great insights, aligns well with the discussions at TC9.9 yesterday in Orlando

Like
Reply
John Weems

Engi-nerd, Recovering Chemist, Inventor

4y

If I read this correctly, a datacenter built around a W3 cooling requirement would work best with two cooling systems.  The first would be fairly low capacity providing dehumidification and comfort cooling.  The second would be much higher capacity serving the IT equipment.  However, it would only need cooling towers, not chillers.  During winter months the cooling system could provide heating for a facility.

Like
Reply

To view or add a comment, sign in

Insights from the community

Others also viewed

Explore topics