Data Centre Liquid Cooling: AI’s New Heat Bottleneck

Published on: 2026-05-28
Updated on: 2026-05-28

If you have spent any time looking at AI infrastructure seriously, you already know the compute story. More GPUs, bigger clusters, faster chips, more capital. But the part that does not get nearly enough attention, and frankly the part that keeps data centre operators up at night, is heat.

Raw, relentless, physics-is-non-negotiable heat. Because here is the thing: you can throw a trillion dollars at AI buildout, but if you cannot cool the hardware, none of it runs. Data centre liquid cooling has quietly become the chokepoint that determines whether this entire infrastructure supercycle actually delivers.

And in 2026, after years of the industry hedging and half-stepping, the direction is clear. Liquid cooling is not a premium option for specialist workloads anymore. For the densest AI clusters, it is becoming part of the baseline infrastructure requirement.

Data Centre Liquid Cooling

Key Takeaways

Data centre liquid cooling is becoming critical because AI rack densities are moving above what air cooling can handle efficiently.
GB200 and GB300 rack-scale systems show how AI infrastructure is shifting toward 100 kW-plus thermal loads.
Liquid cooling improves heat removal by bringing coolant closer to the chip, reducing the burden on air systems.
The transition is not frictionless: operators still need coolant distribution units, leak detection, retrofits and hybrid thermal designs.
The real issue is not only efficiency. It is whether data centres can run high-density AI hardware at full capacity.

Air Cooling Has Hit a Wall, and the Wall Is Not Moving

To understand why, look at what the latest chip generations are actually demanding from a thermal standpoint. The NVIDIA GB200 NVL72 system shows how far rack-level power density has moved.

Published configurations list roughly 132 kW per rack, with most of that thermal load handled through liquid cooling rather than air. A single rack-scale system can combine 72 Blackwell GPUs and 36 Grace CPUs, creating heat density that would have looked extreme by conventional data centre standards only a few years ago.

The challenge is not abstract. At these densities, operators are dealing with several hard constraints at once:

Airflow demand rises sharply: Fans must move far more air through tighter server layouts, raising power consumption and mechanical stress.
Lower intake temperatures become impractical: Chilling air aggressively enough to cool 100 kW-plus racks creates its own energy and facility burden.
Hot spots become harder to control: Dense GPU clusters concentrate heat around processors, memory, networking and power components.
Floor space becomes less forgiving: Spreading hardware out to make air cooling work reduces compute density and weakens the economics of AI deployment.

That is why cooling a rack at this density with air is not a realistic long-term solution at scale. And it only gets more demanding from here. The GB300 NVL72, the next Blackwell iteration, supports up to 142 kW per rack in reference designs co-developed for high-density AI infrastructure.

High-end AI accelerators are moving toward the 1,000W-per-chip power envelope, and air cooling has simply run out of runway for the most power-dense AI deployments. Rack densities breaching 100 kW are making immersion and direct-to-chip cooling the practical architecture for high-density AI clusters. This is not a distant projection. It is already the operating reality for anyone deploying serious AI infrastructure today.

Many technology transitions are framed as “the future is coming.” This one is different. Data centre operators who defer liquid cooling infrastructure upgrades are not only falling behind on efficiency. They are taking on capacity risk. You cannot run the hardware at full potential without the cooling. It is that binary.

The Economics Are Genuinely Compelling, Not Just the Engineering

Here is where it gets interesting from an investment and operational standpoint, because the efficiency argument is stronger than most people appreciate.

Liquid is roughly 3,000 times more effective than air in heat-transfer terms. That is not a marginal improvement. It is a different category of solution.

Historically, cooling has accounted for a large share of data centre electricity consumption, often cited at up to 40% in conventional environments. That makes thermal efficiency one of the most significant areas where operators can reduce both operating expenses and energy demand.

Economic lever	Why liquid cooling changes the equation
Cooling energy	Liquid cooling can materially reduce the energy needed to remove heat, although savings depend on density, climate, chiller design and water strategy.
Rack density	Removing heat closer to the chip allows operators to place more compute into the same physical footprint.
Hardware utilisation	Better thermal control reduces the risk of chips slowing down under sustained workloads.
Facility economics	Higher density can improve the return on scarce land, power capacity and fibre connectivity.
Operating resilience	More stable temperatures can reduce thermal stress, though the benefit depends on design and maintenance quality.

At hyperscaler scale, where power bills run into the billions annually, that efficiency delta is not a footnote. It is a material input to unit economics.

The NVIDIA GB200 NVL72 rack-scale liquid-cooled system reflects the same point. When coolant is routed directly to the chip instead of relying on air to carry heat away fast enough, operators stop fighting physics and start working with it. Higher density becomes possible because heat is removed closer to the source.

Microsoft Azure, NVIDIA GB300 NVL72

There is also the thermal throttling issue, which gets underestimated. In air-cooled environments running near thermal limits, chips automatically reduce their clock speeds to avoid overheating. That is a silent, chronic drag on the exact workloads these data centres exist to run.

Liquid systems provide tighter thermal control than air-cooled designs, helping high-performance chips sustain heavier workloads with less temperature volatility. For AI training jobs where completion time directly affects infrastructure costs, sustained peak throughput versus a burst-and-recover cycle is a meaningful operational difference.

The Compounding Operational Case

Beyond the power bill, there are reliability and density arguments that stack up quickly. More stable temperatures can reduce thermal cycling stress, which may lower failure risk and extend component life, although the actual benefit depends on workload intensity, coolant design and maintenance discipline.

When a GPU cluster represents hundreds of millions in capital, that reliability improvement is not trivial. It changes depreciation assumptions, maintenance budgets and refresh cycle planning.

On density: because liquid systems are far more compact than the air handling infrastructure needed to cool equivalent loads, some operators are already seeing substantial increases in computational density per rack following the transition. In a market where land, power capacity and fibre connectivity are increasingly scarce and expensive, fitting more compute into the same footprint is a structural advantage that compounds over time.

Direct-to-chip cooling remains the most mature and widely deployed architecture for many AI rack designs. Immersion cooling is scaling alongside it for the most extreme density deployments.

Both single-phase and two-phase dielectric fluid systems are ramping, with two-phase systems commanding a premium for extreme-density builds. These are not niche research projects. They are production infrastructure that the biggest operators in the world are betting real capital on.

Microsoft has already deployed its “Sidekick” liquid cooling systems with direct-to-chip cold plates for its Azure Maia AI Accelerator chips, and is simultaneously exploring microfluidics to push efficiency further. When major cloud operators are retrofitting existing data centres rather than waiting only for greenfield builds, that tells you something about the urgency of the transition.

The Transition Is Not Frictionless

Liquid cooling is not a magic switch. It brings its own operational burden.

Operators need coolant distribution units, leak detection, pressure management, fluid quality control, maintenance protocols, staff training and tighter coordination between the IT stack and facility infrastructure. Existing data centres may not have the pipe routing, floor loading, heat rejection systems or power distribution needed to support the densest AI racks without major retrofits.

That is why hybrid systems will remain common. Air cooling is not disappearing. It will continue to cool lower-density racks, storage, networking equipment and secondary components inside high-density systems. The shift is not from air to liquid overnight. It is from air-dominant cooling to liquid-led thermal architecture.

The strongest operators will not simply buy liquid cooling equipment. They will redesign the facility around heat, power and compute as one integrated system.

Follow the Capital

The market data at this point is doing a lot of the talking. One market estimate projects the data centre liquid cooling market rising from roughly USD 5.1 billion in 2025 to USD 6.41 billion in 2026, with the market on track to reach more than USD 16 billion by 2030. That is a genuine structural growth story, not a hype cycle with fuzzy demand.

Data Centre Liquid Cooling Market by 2030

Supplier order books are one of the clearest signs that liquid cooling has moved beyond pilot projects. Demand is now showing up in orders, delivery timelines and capacity planning.

Backlogs in this part of the supply chain suggest that demand is becoming structural, not merely experimental. They point to constrained supply meeting structural demand that is still accelerating.

The top cloud providers are expected to spend hundreds of billions of dollars on infrastructure in 2026, with a large share directed at physical AI assets. Each of those dollars in GPU capex creates downstream demand for the cooling systems that keep those GPUs alive and performing. At rack densities above 100 kW, cooling infrastructure is no longer a secondary line item. It is part of the AI compute budget.

Regulatory pressure is accelerating adoption further, particularly in Europe and Japan, where governments are tightening the conditions under which large-scale data centres can operate. Sustainability mandates are no longer soft future commitments. They are reshaping procurement timelines right now.

Final Thoughts: The Transition Window Is Narrowing

In 2026, the ability to deploy and scale advanced cooling infrastructure is a defining competitive advantage. Liquid cooling can no longer be considered an emerging technology or a discretionary add-on for high-density AI.

Operators still hedging on the transition are not making a conservative capital allocation decision. They are taking on a different kind of risk: thermal bottlenecks that cap compute density, power costs structurally higher than peers, lower rack utilisation and a ceiling on AI expansion capacity precisely when demand is at its most aggressive. The gap between facilities that have made the transition and those that have not is already measurable, and every new GPU generation widens it.

The AI revolution runs on chips. The chips run on liquid cooling. And at this point, the operators who understood that first are not just ahead on infrastructure. They are ahead on everything that infrastructure enables.

Sources

Disclaimer: This material is for general information purposes only and is not intended as (and should not be considered to be) financial, investment or other advice on which reliance should be placed. No opinion given in the material constitutes a recommendation by EBC or the author that any particular investment, security, transaction or investment strategy is suitable for any specific person.