DGX/HGX/MGX platforms, GPU form factors (SXM vs PCIe), PCIe Gen 5, ConnectX-7 networking, GPUDirect Storage, and DGX POD scale-out. Every number and spec the NCP-AII exam tests.
NVIDIA offers a layered ecosystem of AI server platforms — from complete turnkey systems to OEM reference designs. Understanding which platform maps to which use case, and the specs that differentiate them, is central to the NCP-AII exam.
| Platform | What It Is | Target | GPU Config |
|---|---|---|---|
| DGX | Complete NVIDIA turnkey AI server (CPU + GPU + storage + networking) | AI training, research, enterprise | 8× SXM GPUs + 4× NVSwitch |
| HGX | OEM GPU baseboard (GPUs + NVSwitch, no CPU/storage) | Cloud providers, OEM servers | 4 or 8× SXM GPUs + NVSwitch |
| MGX | NVIDIA modular GPU server reference architecture | OEM edge & inference servers | 1–8× PCIe or SXM GPUs |
| OVX | Omniverse/simulation server reference design | Industrial AI, digital twins | RTX / L40S GPUs |
| GB200 NVL72 | 72× B200 GPUs + 36 Grace CPUs in one liquid-cooled rack | Hyperscale AI training | 72× B200 via NVLink 5 |
| System | GPU | Total HBM | FP8 AI | NVLink/GPU | Power |
|---|---|---|---|---|---|
| DGX A100 | 8× A100 SXM | 320 GB | ~25 PFLOPS | 600 GB/s | 6.5 kW |
| DGX H100 | 8× H100 SXM5 | 640 GB | 32 PFLOPS | 900 GB/s | 10.2 kW |
| DGX H200 | 8× H200 SXM | 1,128 GB | ~32 PFLOPS* | 900 GB/s | 10.2 kW |
| DGX B200 | 8× B200 SXM | 1,536 GB | ~72 PFLOPS | 1.8 TB/s | ~14.3 kW |
* H200 has same GH100 compute die as H100; improvement is HBM3e capacity/bandwidth (4.8 TB/s vs 3.35 TB/s), not peak TFLOPS.
The DGX H100 is the most exam-tested platform. Memorize these specs completely — they appear directly in NCP-AII questions about system capacity, power, and interconnect design.
HGX is NVIDIA's GPU subsystem for OEM partners. It includes the GPU tray (H100/H200/B200 SXM GPUs + NVSwitch chips) without the CPU, DRAM, storage, or networking. OEM partners like Dell, HPE, Lenovo, Supermicro, and Inspur integrate HGX into their own server chassis, adding their choice of CPU platform, networking, and cooling.
NVIDIA offers H100 (and other GPUs) in two physical form factors. The choice between them is one of the most exam-critical design decisions in AI server architecture. SXM is for maximum performance; PCIe is for flexibility and standard server integration.
PCIe (Peripheral Component Interconnect Express) is the standard interface connecting GPUs to the CPU and host system. Understanding PCIe bandwidth — and its bottleneck effects — is essential for AI system design.
| PCIe Gen | GT/s per lane | x8 BW (BiDir) | x16 BW (BiDir) | First GPU Gen |
|---|---|---|---|---|
| PCIe 3.0 | 8 GT/s | 16 GB/s | 32 GB/s | Pascal (P100) |
| PCIe 4.0 | 16 GT/s | 32 GB/s | 64 GB/s | Ampere (A100) |
| PCIe 5.0 | 32 GT/s | 64 GB/s | 128 GB/s | Hopper (H100) |
| PCIe 6.0 | 64 GT/s | 128 GB/s | 256 GB/s | Future / Blackwell+ |
For PCIe H100 deployments needing some GPU-to-GPU bandwidth above PCIe, NVIDIA offers the NVLink Bridge — a physical connector linking 2 or 4 PCIe GPUs via NVLink. However, the NVLink Bridge provides far less bandwidth than a full NVSwitch topology (limited to 2–4 GPU pairs, not a full mesh), and is not a substitute for the SXM/NVSwitch architecture in large training workloads.
AI servers require multiple distinct network fabrics for different traffic types. Mixing them onto a single network causes congestion and performance collapse. The NCP-AII exam tests your understanding of which network handles which traffic, and the hardware involved.
ConnectX-7 is NVIDIA's 7th-generation network adapter, standard in DGX H100 (one per GPU). It supports both InfiniBand NDR (400 Gb/s) and 400GbE, making it dual-mode capable. Each ConnectX-7 directly services one H100 GPU, ensuring the GPU's network bandwidth is not shared with other GPUs.
| Feature | ConnectX-7 Spec |
|---|---|
| Max port speed | 400 Gb/s (NDR InfiniBand or 400GbE) |
| Protocol support | InfiniBand NDR, 100/200/400GbE, RoCEv2 |
| GPUDirect RDMA | Yes — NIC DMA direct to/from GPU HBM |
| PCIe interface | PCIe Gen 5 x16 |
| RDMA latency | <600 ns (InfiniBand) |
| Count in DGX H100 | 8× (1 per GPU) + 1× management |
| Offloads | RDMA, GPUDirect, RoCEv2, SHARP, TCP offload |
GPUDirect RDMA (Remote Direct Memory Access) allows a ConnectX-7 NIC to transfer data directly between a remote server's GPU memory and the local GPU's HBM — completely bypassing the CPU and system DRAM. This eliminates two PCIe crossings per transfer and can double effective network bandwidth for GPU-to-GPU operations across nodes.
| Feature | InfiniBand NDR | Spectrum-X (Ethernet) |
|---|---|---|
| Port speed | 400 Gb/s | 400 GbE |
| Latency | ~600 ns (lowest) | ~2–5 µs |
| Protocol | IB native (lossless) | RoCEv2 over UDP/IP |
| Congestion control | Hardware-native ECN + credit-based | DCQCN (PFC + ECN) |
| SHARP in-network compute | Yes (AllReduce in switch ASIC) | Yes (Spectrum-4 ASIC) |
| Use case | Tightly-coupled HPC/AI training | Cloud-native, Ethernet-standard AI |
| NIC | ConnectX-7 (dual-mode) | ConnectX-7 (dual-mode) |
AI training has extreme storage demands: large dataset reads during training, frequent checkpointing of multi-hundred-GB model weights, and burst I/O patterns. The storage architecture must match these demands — a slow filesystem becomes the training bottleneck even with 8 fast H100s.
GPUDirect Storage enables a DMA engine to transfer data directly between NVMe SSDs and GPU HBM memory, bypassing the CPU and system DRAM entirely. This is critical for checkpointing large models (e.g., saving a 140 GB LLaMA 70B checkpoint) without saturating PCIe with CPU-routed copies.
cuFile API and requires the nvidia-fs kernel driver.
// GPUDirect Storage read example (cuFile API)
CUfileDescr_t cf_desc;
CUfileHandle_t cf_handle;
cuFileDriverOpen();
cuFileHandleRegister(&cf_handle, &cf_desc);
// Direct NVMe → GPU DMA — no CPU data copy
cuFileRead(cf_handle, gpu_buffer_ptr, read_size, file_offset, 0);
| Filesystem | Type | Bandwidth | Best For | Notes |
|---|---|---|---|---|
| Lustre | Open-source parallel FS | 100s GB/s | HPC + AI training datasets | Standard in DGX SuperPOD reference |
| WEKA | Flash-native parallel FS | Up to 1 TB/s | All-flash AI clusters | Native S3 + POSIX, NVIDIA validated |
| IBM Spectrum Scale (GPFS) | Enterprise parallel FS | 100s GB/s | Enterprise AI, finance | Mature, complex mgmt |
| BeeGFS | Open parallel FS | 10–100 GB/s | Academic clusters | Easy setup, lower performance ceiling |
| NFS v4 | Network FS | 1–10 GB/s | Not recommended for training | Single metadata server bottleneck |
Single DGX nodes are the building block. NVIDIA provides reference architectures — DGX POD and DGX SuperPOD — that define how to interconnect multiple DGX nodes into validated AI clusters with known-good performance, networking, and storage configurations.
The GB200 NVL72 takes scale-up further: 72 B200 GPUs and 36 Grace CPUs in a single liquid-cooled rack, all connected in one NVLink 5 domain. Every GPU can communicate with every other GPU at 1.8 TB/s without leaving the rack — eliminating the inter-node InfiniBand bottleneck for models that fit within 72 GPUs.
Click any card to reveal the answer
Select a topic for exam-focused guidance:
nvidia-smi topo -m shows NV18 (SXM + NVSwitch) vs SYS (PCIe cross-socket) entries for GPU-to-GPU bandwidth class.nvidia-peermem driver module + ConnectX-7. Verify with lsmod | grep nvidia_peermem.cuFile API. Requires nvidia-fs kernel driver + GDS-compatible NVMe. Dramatically reduces checkpoint overhead for large models.| Fact | Mnemonic / Hook |
|---|---|
| DGX H100 = 8 GPUs, 640 GB, 32 PFLOPS | "8 GPUs × 80 GB = 640 GB. 8 × 3,958 TFLOPS ÷ 1,000 = ~32 PFLOPS" |
| DGX H100 power = 10.2 kW | "8 GPUs × 700W + CPU/overhead ≈ 10.2 kW — like 100 gaming PCs" |
| PCIe 5 = 128 GB/s | "Gen 5 = 5 × 32 GT/s per lane × 16 lanes ÷ 8 bits = 128 GB/s" |
| SXM = 700W; PCIe = 350W | "SXM = double the power, double the bandwidth. PCIe = half price, half bandwidth" |
| ConnectX-7 = 400 Gb/s dual-mode | "CX-7 = both roads: InfiniBand or Ethernet, same NIC" |
| SuperPOD = 32 nodes = 256 GPUs | "32 × 8 = 256 — 32 DGX nodes make one SuperPOD" |
| GDS = cuFile API | "GPU Direct Storage = cuFile + nvidia-fs kernel module" |