Data Center Power & Cooling for AI
Modern AI infrastructure has transformed power and cooling from afterthoughts into primary design constraints. A single rack of NVIDIA GB200 NVL72 systems demands ~120 kW — enough to power a small building. Understanding these fundamentals is essential for every NCP-AII certified professional.
Why This Topic Matters: GPU TDPs have doubled every generation. H100 SXM5 at 700 W (vs A100 at 400 W) changed facility design requirements. The GB200 NVL72 at ~120 kW per rack makes traditional air cooling obsolete. Power and cooling are now the binding constraints for AI cluster scale-out.
| GPU / System | TDP | Notes |
|---|---|---|
| A100 SXM4 | 400 W | Air viable |
| H100 PCIe | 350 W | Air viable |
| H100 SXM5 | 700 W | DLC preferred |
| B200 SXM | ~1,000 W | DLC required |
| DGX H100 | 10.2 kW | 8× H100 SXM5 |
| GB200 NVL72 | ~120 kW | Liquid / direct |
| Concept | Value / Rule |
|---|---|
| PUE (perfect) | 1.0 — impossible in practice |
| PUE (hyperscale) | 1.1 – 1.2 |
| PUE (average DC) | 1.4 – 1.6 |
| Air cooling max | ~25–30 kW/rack |
| DLC capacity | 50–120 kW/rack |
| Immersion capacity | 100+ kW/tank |
| kW vs kVA | kW = real; kVA = apparent (UPS) |
Every watt flowing into a GPU travels through this chain — each stage adds loss and latency to power delivery:
ATS = Automatic Transfer Switch (utility ↔ generator). UPS bridges the ~10–30 second gap until generator starts. Each conversion step reduces efficiency — this is why 80 Plus certification matters for PSUs.
A PUE of 1.2 means 20% overhead — for every 100 kW of IT load, 20 kW is lost to cooling, lighting, UPS, and distribution losses.
PSU efficiency ratings at 50% load:
| Tier | Efficiency |
|---|---|
| Titanium | 96% |
| Platinum | 94% |
| Gold | 90% |
| Silver | 88% |
| Bronze | 82% |
| Class | Inlet Temp Range |
|---|---|
| A1 | 15–32°C (most stringent) |
| A2 | 10–35°C |
| A3 | 5–40°C |
| A4 | 5–45°C (most relaxed) |
Power Fundamentals
Understanding electrical concepts — PUE, kW vs kVA, power chains, and PSU efficiency — is foundational for AI data center planning.
PUE measures how efficiently a data center uses power. An IT load of 1,000 kW with PUE 1.5 requires 1,500 kW from the grid — 500 kW lost to cooling, UPS losses, lighting, and power distribution.
Strategies to reduce PUE: hot/cold aisle containment, outside air economization, liquid cooling (removes heat at source, no chiller needed), higher ASHRAE inlet temp setpoints (reduces chiller work), and on-site renewable generation.
kW (kilowatts) = Real Power — the actual power consumed and converted to work (heat, computation). This is what your electricity bill measures and what GPU TDP specifications use.
kVA (kilovolt-amperes) = Apparent Power — the product of RMS voltage × RMS current. Always ≥ kW. UPS units, PDUs, and generators are rated in kVA because they must handle the full current draw regardless of power factor.
Power Factor (PF) = kW / kVA. Modern server PSUs achieve PF ≈ 0.99. Older equipment may have PF 0.6–0.8, requiring oversized UPS capacity.
Exam Trap: GPU TDPs and server power consumption are quoted in kW (real power). UPS and PDU ratings are in kVA. Never compare them directly without accounting for power factor.
Online Double-Conversion (most common in AI DCs): Always running through inverter/rectifier — zero transfer time, best isolation from grid disturbances. Required for sensitive GPU compute clusters.
Line-Interactive: Uses tap-changing transformer; 2–10 ms transfer time. Acceptable for non-critical loads.
Standby (offline): Switches on failure; 4–25 ms transfer. Not suitable for AI infrastructure.
UPS provides power during the generator start sequence — typically 10–30 seconds. Battery runtime is sized accordingly (not for extended outages — that's what generators are for).
N+1 / 2N redundancy: N+1 = one extra UPS module; 2N = full second UPS system (required for Tier III/IV facilities).
| Stage | Component | Function | Typical Loss |
|---|---|---|---|
| 1 | Utility Grid | High-voltage AC supply (typically 11–33 kV) | — |
| 2 | Step-Down Transformer | Reduces to 480V or 208V for facility distribution | 1–2% |
| 3 | ATS (Automatic Transfer Switch) | Switches between utility and generator; <100 ms transfer | <0.5% |
| 4 | Generator | Diesel/gas backup; provides power during utility outage | — |
| 5 | UPS | Bridges generator start time; double-conversion = ~95% efficient | 4–6% |
| 6 | Main PDU | Distributes and monitors power to floor rows | 1–2% |
| 7 | Rack PDU | Per-outlet metering; often A+B (redundant) feed | <1% |
| 8 | Server PSU | AC→DC conversion; 80 Plus Titanium = 96% efficient at 50% load | 4–18% |
| 9 | VRM (Voltage Regulator Module) | Final DC regulation to GPU cores (~1V) | 3–5% |
At scale, PSU efficiency is a significant operational cost. A DGX SuperPOD with 32 × DGX H100 nodes at 10.2 kW each:
DGX SuperPOD PSU Loss Comparison
| GPU | Form Factor | TDP | Cooling Implication |
|---|---|---|---|
| A100 | SXM4 | 400 W | Air cooling viable with high-performance CRAH |
| A100 | PCIe | 300 W | Standard server air cooling |
| H100 | SXM5 | 700 W | DLC strongly preferred; air at limit |
| H100 | PCIe | 350 W | Air cooling viable |
| H200 | SXM5 | 700 W | Same die as H100; HBM3e difference |
| B200 | SXM | ~1,000 W | Direct liquid cooling required |
| DGX H100 | Full system | 10.2 kW | 8× H100 SXM5; specialized rack cooling |
| GB200 NVL72 | Full rack | ~120 kW | Factory-integrated direct liquid cooling |
Cooling Technologies
Three primary paradigms exist for cooling AI infrastructure: air, direct liquid cooling (DLC), and immersion. Each has distinct capacity limits, costs, and deployment tradeoffs.
Traditional CRAC/CRAH units with hot/cold aisle containment. Viable for A100 and H100 PCIe at 300–400 W TDP.
- CRAC: Computer Room AC — DX refrigerant cooling
- CRAH: Computer Room Air Handler — uses chilled water from central chiller plant
- Hot/cold aisle containment: segregates airflows, improves delta-T efficiency
- Economizer mode: uses outside air when ambient temp is low enough
- PUE: typically 1.3–1.6 with chilled water CRAH
- CapEx: lowest of all cooling methods
- Limit: ~25–30 kW/rack before hot-spot formation
Cold plates on CPU/GPU + liquid manifolds in rack. H100 SXM5 and B200 require DLC for sustained workloads.
- Cold plates: metal plates with internal channels; attach directly to GPU/CPU die
- Liquid: typically 30–45°C supply water (warm water cooling possible)
- Rear-door heat exchanger (RDHx): attaches to back of rack, captures hot exhaust air
- CDU (Coolant Distribution Unit): manages coolant flow, pressure, temperature per rack
- PUE: 1.05–1.15 with warm water cooling (no chiller at mild ambient temps)
- GB200 NVL72: factory-integrated DLC, shipped as complete rack unit
- Requires building liquid infrastructure: manifolds, piping, leak detection
Servers submerged in dielectric fluid. Highest density, near-perfect heat transfer. Two variants: single-phase and two-phase.
- Single-phase: fluid stays liquid; circulated through external heat exchanger
- Two-phase: fluid boils on hot components; vapor condenses on coils (higher efficiency)
- Fluid: engineered dielectric (non-conductive) — e.g., mineral oil, engineered fluorinerts
- PUE: as low as 1.02–1.05 (near-perfect heat capture)
- NVIDIA validation: specific fluids and dipping times approved per GPU model
- Drawbacks: upfront cost, fluid management complexity, limited tooling access
- Enables overclocking / sustained boost clocks not possible in air
Without containment, supply air mixes with exhaust air before reaching equipment intakes — causing cooling inefficiency and hot spots.
Cold aisle containment: Encloses the cold aisle with doors and ceiling panels. Server intakes pull cold air exclusively from the contained space. More common.
Hot aisle containment: Encloses the hot exhaust aisle, channeling hot air directly to CRAH return plenum. Reduces risk of hot air recirculation into adjacent aisles.
Effective containment can improve cooling efficiency by 20–30%, allowing higher rack densities with existing infrastructure.
Recommended airflow design:
- Raised floor: cold air delivered via perforated tiles beneath racks
- Blanking panels: fill empty rack spaces to prevent air bypass
- Supply temperature: 18–22°C cold aisle target
- Return temperature: 35–45°C hot aisle (higher = more efficient chiller operation)
- Variable speed fans: match airflow to actual heat load
ASHRAE A1 limit: 15–32°C inlet temperature at equipment intake. Most NVIDIA GPUs require A1 compliance for full-speed sustained operation.
| Method | Rack Density | Typical PUE | CapEx | Water Use | Best For |
|---|---|---|---|---|---|
| Air (CRAC/CRAH) | ≤30 kW | 1.3–1.6 | Low | Minimal | A100, H100 PCIe |
| Air + Economizer | ≤30 kW | 1.1–1.3 | Medium | Low | Moderate climates |
| Rear-door HX (RDHx) | 30–60 kW | 1.1–1.2 | Medium | Low | Retrofit/hybrid |
| Direct Liquid (cold plate) | 50–120 kW | 1.05–1.15 | High | Low–moderate | H100/B200 SXM, DGX |
| Immersion (single-phase) | 100+ kW | 1.03–1.08 | Very high | None | Extreme density |
| Immersion (two-phase) | 100+ kW | 1.02–1.05 | Very high | None | Maximum efficiency |
Thermal Management
ASHRAE thermal guidelines define safe operating envelopes for IT equipment. Understanding these classes and GPU thermal throttling behaviors is critical for sustained AI workload performance.
ASHRAE (American Society of Heating, Refrigerating and Air-Conditioning Engineers) defines standardized inlet temperature and humidity ranges that IT equipment must tolerate. Classes apply to temperature measured at the equipment intake, not the room ambient.
| Class | Inlet Temp (°C) | Max Humidity | Typical Equipment | Stringency |
|---|---|---|---|---|
| A1 | 15 – 32°C | 80% RH, 17°C dew point | Mission-critical, enterprise servers | Most Stringent |
| A2 | 10 – 35°C | 80% RH, 21°C dew point | Standard servers, networking | Moderate |
| A3 | 5 – 40°C | 85% RH, 24°C dew point | Ruggedized/industrial servers | Relaxed |
| A4 | 5 – 45°C | 90% RH, 24°C dew point | Hardened/outdoor deployments | Most Relaxed |
Key Exam Fact: A1 is the most stringent (narrowest, coolest range). A4 is the most relaxed (widest, warmest range). Most NVIDIA enterprise GPUs require A1 compliance for sustained full performance.
NVIDIA GPUs implement hardware thermal protection through two mechanisms:
Power Throttling (SW thermal threshold): GPU reduces power consumption before hitting thermal limit. Maintains operation at reduced performance. Triggered ~83–85°C for most data center GPUs.
Hardware Thermal Shutdown: GPU halts if junction temperature exceeds safe limit (~90–95°C for H100). Requires system reboot.
nvidia-smi monitoring:
nvidia-smi dmon -s pucvmet -d 5
# Shows: power draw, utilization, clock speeds, temp
# Throttle reason codes appear in 'violations' column
nvidia-smi -q -d PERFORMANCE
# Detailed throttle reason: thermal, power, sync boost, etc.
Front-to-back airflow: Industry standard. Cold air enters front bezel, hot air exhausts rear. Critical for hot/cold aisle containment compatibility.
CFD (Computational Fluid Dynamics): Used in facility planning to model airflow, identify hot spots, and optimize CRAH placement before physical deployment.
Delta-T target: The temperature rise from cold aisle to hot aisle should be 10–20°C. Too small = overcooling (wasted energy). Too large = potential hot spots.
Airflow-matching: NVIDIAs high-density GPU systems require server fans rated to overcome high static pressure — important for dense GPU configurations where GPU cards restrict internal airflow.
| Metric | Tool | Normal Range | Action Required If |
|---|---|---|---|
| GPU Temperature (°C) | nvidia-smi -q -d TEMPERATURE | <80°C sustained | >85°C — check cooling, airflow |
| GPU Power Draw (W) | nvidia-smi -q -d POWER | Near TDP at load | <TDP under full load = throttling |
| Memory Temperature | nvidia-smi -q -d TEMPERATURE | <85°C (HBM) | >90°C — HBM thermal design issue |
| Fan Speed | nvidia-smi -q -d FAN | Auto-managed | 100% sustained = cooling deficiency |
| Throttle Reasons | nvidia-smi -q -d PERFORMANCE | None active | Any thermal throttle = escalate |
| DCGM Health | DCGM health-check | Pass all | Failures → GPU replacement ticket |
Facility Level:
- Maintain cold aisle inlet at 18–27°C (within A1 range)
- Use DCIM (Data Center Infrastructure Management) tools for real-time thermal mapping
- Hot aisle temperature ≤45°C to protect CRAH efficiency
- Install temperature sensors at intake, mid-rack, and exhaust
- All blanking panels installed; no empty rack positions uncovered
Server/GPU Level:
- Verify TIM (Thermal Interface Material) integrity at GPU installation
- For DLC: verify cold plate seating torque per NVIDIA spec
- Monitor coolant flow rate and supply/return temperature differential
- Set appropriate power limits with
nvidia-smi -pl <watts> - Enable DCGM health monitoring for continuous GPU thermal telemetry
AI Data Center Design
Designing power and cooling infrastructure for AI clusters requires bottom-up power budgeting, PUE-adjusted facility planning, and redundancy architecture matched to the scale of deployment.
A DGX H100 SuperPOD is the reference AI cluster design. Let's build the complete power budget:
DGX H100 SuperPOD — Full Power Budget
GPU count: 32 × 8 = 256 H100 GPUs. Theoretical FP8 throughput: 256 × 3,958 TFLOPS ≈ 1,013 PFLOPS ≈ ~1 ExaFLOP FP8.
The GB200 NVL72 is a single rack unit containing 72 B200 GPUs + 36 Grace CPUs. It represents the extreme of current AI infrastructure density.
Single GB200 NVL72 Rack
Compare to H100 SXM5: 120 kW ÷ 700 W = 171 GPUs worth of H100 compute in a single GB200 NVL72 rack.
Infrastructure implications:
- Power circuits: requires dedicated high-amperage feeds (often 480V 3-phase)
- Floor loading: DLC fluid adds weight; verify structural capacity
- Chilled water supply: CDUs require facility chilled water or process cooling water loop
- Leak detection: mandatory with in-rack liquid; sensors at every manifold
- UPS: must handle 120 kW/rack × rack count — typically dedicated UPS modules per cluster
- Generator sizing: AI clusters require N+1 or 2N generator capacity at full load
| Tier | Architecture | Availability | Use Case |
|---|---|---|---|
| Tier I | Single path, no redundancy | 99.671% | Dev/test environments |
| Tier II | N+1 redundant components | 99.741% | Internal enterprise |
| Tier III | Concurrently maintainable (N+1 paths) | 99.982% | Commercial AI DCs |
| Tier IV | Fault tolerant (2N paths) | 99.995% | Mission-critical AI |
DGX H100 server PSU redundancy: Each DGX H100 ships with redundant PSUs. Best practice: connect A-feed and B-feed from separate PDUs on separate UPS/generator chains.
WUE (Water Usage Effectiveness): Liters of water per kWh of IT load. Chiller-based cooling consumes significant water for evaporative towers. DLC with dry coolers eliminates most water consumption.
CUE (Carbon Usage Effectiveness): kg CO₂ per kWh of IT load. Driven by local grid carbon intensity. On-site solar/wind reduces CUE.
ERE (Energy Reuse Effectiveness): Measures how much waste heat is recovered and reused (e.g., warming buildings). DLC enables waste heat recovery at useful temperatures (40–60°C supply water).
NVIDIA's Sustainability Focus:
- GB200 NVL72: DLC from factory minimizes PUE overhead
- NVLink switching reduces inter-GPU traffic (vs PCIe) saving switch power
- MIG on A100/H100 improves GPU utilization (less idle power)
- FP8/FP4 precision: more work per watt vs FP32
- DCGM power capping: enforce cluster-wide power limits without sacrificing SLA
Practice Quiz
10 questions covering power fundamentals, cooling technologies, ASHRAE classes, and AI cluster power planning. Select your answer to reveal the explanation.
Review the explanations above for any missed questions.
Memory Hooks & Advisor
Mnemonics, patterns, and quick-reference guidance to lock in the most exam-critical power and cooling concepts.
PUE Formula
PUE = Total Facility Power ÷ IT Equipment Power
Perfect = 1.0 | Hyperscale = 1.1–1.2
H100 SXM5 TDP
700 W (DLC preferred)
vs H100 PCIe = 350 W, B200 ≈ 1,000 W
ASHRAE A1 Inlet Range
15 – 32°C
Most stringent class (narrowest, coolest)
Air Cooling Max Rack Density
~25–30 kW/rack
Above this → hot spots form; DLC required
GB200 NVL72 Power Draw
~120 kW per rack
72× B200 + 36× Grace CPU
Factory-integrated DLC
DGX H100 System TDP
10.2 kW total system
8× H100 SXM5 (700 W each) + CPU/mem/NVMe/net
80 Plus Titanium % at 50% Load
96% efficient
Gold=90%, Platinum=94%, Titanium=96%
What Does the UPS Do?
Bridges 10–30 sec generator start time during utility outage. Double-conversion type provides zero transfer time and best power quality.
⚡ Power Fundamentals
- PUE = Total Facility Power ÷ IT Equipment Power. Perfect = 1.0 (unachievable). Hyperscale leaders target 1.1–1.2 with DLC and economizers.
- kW (kilowatts) is real power — what GPUs draw, what your bill measures. kVA is apparent power — what UPS and PDUs are rated in. kVA ≥ kW always; PF = kW/kVA.
- 80 Plus Titanium = 96% PSU efficiency at 50% load. At cluster scale, choosing Titanium over Gold saves 6% of PSU losses — tens of kW in a SuperPOD.
- GPU TDP is a thermal design point, not a guaranteed maximum draw. Actual draw varies with workload — training at FP8 sustained often approaches TDP. Idle = much lower (40–60 W).
- Power budget formula: Total Facility Power = IT Load × PUE. IT load includes GPUs, CPUs, memory, storage, networking — not just GPU TDP alone.
- DGX H100 SuperPOD: 32 × 10.2 kW = 326.4 kW IT load. At PUE 1.2 = ~391 kW facility power total.
💧 Cooling Technology Selection
- Air cooling limit: ~25–30 kW/rack. Sufficient for A100 PCIe (300 W) and H100 PCIe (350 W) in standard server configurations with <4 GPUs.
- H100 SXM5 (700 W): DLC strongly preferred. 8 × 700 W = 5,600 W in GPUs alone. DGX H100 at 10.2 kW exceeds practical air cooling rack limits.
- DLC (cold plates): 50–120 kW/rack. Requires facility chilled water infrastructure: supply/return pipes, CDU per rack, leak detection. PUE 1.05–1.15.
- Rear-door heat exchanger (RDHx): attaches to existing racks; captures hot exhaust air in a liquid-cooled door. Retrofit-friendly but lower capacity than full cold plates.
- Immersion: 100+ kW/tank. Highest density and PUE (1.02–1.05). Requires NVIDIA validation of dielectric fluid type and immersion duration per GPU model.
- GB200 NVL72 (~120 kW): ships with factory-integrated DLC. Customer connects facility chilled water — no additional cooling hardware selection required.
🌡️ Thermal Management
- ASHRAE A1 (15–32°C) is the most stringent class — required for mission-critical enterprise GPUs. A1 → A2 → A3 → A4: strictness decreases, allowed temperature range widens.
- GPU thermal throttling begins at ~83–85°C junction temperature. Hardware shutdown occurs at ~90–95°C. Monitor with
nvidia-smi -q -d TEMPERATURE. nvidia-smi -q -d PERFORMANCEshows active throttle reasons: thermal, power, sync boost, board limit — each has a distinct root cause and remediation.- Cold aisle target: 18–27°C. Hot aisle: ≤45°C. Delta-T of 10–20°C across the server is normal. >20°C delta suggests inadequate airflow volume.
- Blanking panels are mandatory — uncovered rack slots allow hot exhaust to recirculate into cold aisle intakes, raising effective GPU inlet temperature.
- DCGM (Data Center GPU Manager) provides continuous health monitoring: temperature, power, utilization, ECC errors — essential for production AI cluster operations.
🏗️ AI DC Infrastructure Design
- Power budget process: count all IT loads (GPU nodes + networking + storage + management), multiply by PUE, add growth headroom (20–30%), size UPS and generators accordingly.
- DGX H100 SuperPOD: 32 nodes × 10.2 kW + ~50 kW networking/storage = ~376 kW IT. At PUE 1.2 = ~451 kW facility power. 256 H100 GPUs, ~1 ExaFLOP FP8.
- GB200 NVL72 at ~120 kW/rack: requires dedicated high-amperage 3-phase circuits (480V typical), facility chilled water, structural floor loading assessment, in-rack leak detection.
- Sustainability metrics: WUE (water usage per kWh IT), CUE (CO₂ per kWh IT), ERE (energy reuse effectiveness). DLC with dry coolers eliminates most water use; enables waste heat recovery.
- Power redundancy: Tier III = N+1 paths (concurrently maintainable); Tier IV = 2N paths (fault tolerant). AI production clusters typically require Tier III minimum.
- Dual PSU feeds: connect DGX A-PSU and B-PSU to separate PDU chains on independent UPS/generator paths — critical for maintaining GPU cluster availability during single-chain failures.
🔗 Power Chain & Reliability
- Power chain order: Grid → Transformer → ATS → Generator → UPS → PDU → Rack PDU → Server PSU → VRM → GPU. Each hop has losses — total efficiency product determines delivered power.
- ATS (Automatic Transfer Switch): switches between utility and generator in <100 ms. Does not store energy — that's the UPS's job. ATS transfers; UPS sustains.
- Generator start time: 10–30 seconds typical. UPS battery runtime must cover this window plus margin. UPS batteries are not sized for extended outages.
- Online double-conversion UPS: always running through inverter/rectifier = zero transfer time, best power quality, best protection for GPU compute. Required for AI infrastructure.
- N+1 redundancy: one extra UPS module or generator beyond what is needed. 2N redundancy: complete second independent power chain (highest cost, highest availability).
- PSU loss at scale: Titanium (96%) vs Gold (90%) across 326 kW IT load = ~22 kW difference = ~$19,000/year at $0.10/kWh — a significant OPEX justification for premium PSUs.