BlueField DPUs & DOCA — NCP-AIN Study Guide

What is a DPU?

The third pillar of data center compute — alongside CPU and GPU — that offloads and isolates infrastructure from application workloads.

💡

DPU = CPU + NIC + SmartNIC in one chip

A Data Processing Unit combines a high-performance network interface (ConnectX-7 at 400GbE) with general-purpose Arm compute cores and dedicated hardware accelerators for crypto, compression, and regex — all on one PCIe card. The key insight: run the infrastructure software (firewalls, OVS, storage, encryption) on the DPU's Arm cores, freeing the host CPU exclusively for application workloads.

🖧

Network Offload

Replace the host CPU's software networking stack with hardware acceleration on the DPU.

OVS (Open vSwitch) via ASAP²
SR-IOV virtual functions
VXLAN/NVGRE encap/decap
DPDK datapath acceleration
Traffic shaping & metering

🔒

Security Acceleration

Run security infrastructure isolated from tenant workloads — tenants cannot tamper with it.

Inline IPsec encryption (100 Gbps+)
TLS termination offload
MACsec hardware offload
Deep Packet Inspection (DPI)
Firewall with connection tracking

💾

Storage Acceleration

NVMe-over-Fabrics target/initiator and data services accelerated in DPU hardware.

NVMe-oF target (SNAP)
Erasure coding offload
Lossless compression (LZ4)
Inline data encryption
GPUDirect Storage (GDS)

The Three-Processor Data Center Model

🧠

CPU

Application logic — web servers, databases, AI inference, business logic. Should spend zero cycles on infrastructure plumbing.

⚡

GPU

Massively parallel compute — AI training, inference, HPC simulations, rendering. Needs maximum memory bandwidth for tensor ops.

🧩

DPU (BlueField)

Infrastructure services — networking, security, storage, telemetry. Isolated from tenants. Programmable via DOCA SDK.

DPU Use Cases by Deployment

Cloud / Hyperscale

✅ Offload OVS from host CPU (ASAP²) — reclaim 10-30% CPU for tenant VMs
✅ Isolated management plane — cloud operator controls DPU, tenant cannot
✅ Inline IPsec for encrypted overlay networks (zero host CPU overhead)
✅ NVMe-oF SNAP — emulate local NVMe disk backed by network storage
✅ Zero-trust microsegmentation at the NIC level

AI / HPC Clusters

✅ SuperNIC mode — 400GbE RoCEv2 for NCCL AllReduce (Topic 3)
✅ GPUDirect RDMA — GPU mem ↔ network with zero CPU cycles
✅ DOCA GPUNetIO — receive packets directly into GPU memory
✅ Morpheus AI security — DPU captures telemetry, GPU runs threat models
✅ Telemetry offload — collect per-flow stats without touching host CPU

BlueField-3 Internal Architecture

What's inside the chip, how the components connect, and the four operating modes.

🧩 BlueField-3 DPU Block Diagram

Network Interface (ConnectX-7 Controller)

400GbE Port 0
200GbE × 2

400GbE Port 1
200GbE × 2

RoCEv2 HW
RDMA offload

eSwitch
ASAP² vSwitch

Hardware Accelerators

Crypto
AES-GCM, SHA

Compress
LZ4 / Deflate

Regex
DPI patterns

DOCA Flow
Match-action

DMA Engine
P2P / GPUDirect

Arm Compute Subsystem

16× Arm A78
Armv8.2+

16 GB DDR5
Local DRAM

L3 Cache
Shared

BF-OS
Yocto Linux

Host Connectivity

PCIe Gen5 ×16
Host interface

OOB Mgmt
1GbE port

BMC Interface
IPMI / Redfish

Secure Boot
Root of Trust

BlueField-3 Key Specifications

BlueField-3 Specs
Arm cores	16× Arm A78 (Armv8.2+)
Network speed	400GbE (2 × 200GbE)
NIC controller	ConnectX-7
PCIe generation	Gen5 ×16
Local DRAM	Up to 32 GB DDR5
Crypto throughput	400 Gbps inline IPsec
Compression	Hardware LZ4/Deflate
Management port	Out-of-band 1GbE
OS	BF-OS (Yocto Linux on Arm)
SDK	DOCA 2.x

Key Interfaces

p0/p1: Network-facing ports (uplink to switch)
pf0/pf1: Physical Functions exposed to host via PCIe
vf0, vf1…: Virtual Functions for SR-IOV VMs
sf0, sf1…: Scalable Functions (lightweight VF replacement)
tmfifo: Console/management channel between Arm and host
oob_net: Out-of-band management port (independent of p0/p1)

💡

SF vs VF

Scalable Functions (SF) are a lightweight alternative to VFs — they don't require PCIe function enumeration, making them faster to create/destroy for containerized workloads (Kubernetes pods).

Four Operating Modes

The same BlueField-3 hardware behaves very differently depending on the configured mode.

MODE 1 NIC Mode (Transparent Bridge)

BlueField acts as a standard high-performance NIC. Arm cores are idle or running minimal firmware. Host sees a ConnectX-7 device — full 400GbE bandwidth available. DOCA not active.

✅ Use when: You only need raw NIC performance with zero offload complexity. Pure throughput workloads.

MODE 2 Embedded (SmartNIC / Arm-Controlled)

Arm cores run DOCA applications that share the NIC with the host. Host and Arm both see the network interface. Good for in-line telemetry and monitoring without full isolation.

✅ Use when: Adding network telemetry or lightweight packet processing alongside normal host NIC usage.

MODE 3 Separated (DPU Mode — Isolated)

The primary DPU deployment mode. Host OS and DPU Arm OS are fully isolated — they cannot access each other's memory or management plane. The Arm runs a complete Yocto Linux with DOCA services. Host sees only the VFs/SFs the DPU grants it.

✅ Use when: Cloud provider operating infrastructure independently from tenant. Zero-trust model. Morpheus security. Full offload.

MODE 4 Restricted Host (Tenant Security Mode)

Host is treated as an untrusted tenant. DPU Arm controls all network policy. Even if the host OS is compromised, the attacker cannot modify network policy or access other tenants' traffic. Strongest security isolation.

✅ Use when: Multi-tenant cloud where the host itself is untrusted. Government/financial workloads. Confidential computing.

Architecture Flash Cards — Click to Flip

🧩

How many Arm cores in BlueField-3 vs BlueField-2?

BF-3: 16× Arm A78
BF-2: 8× Arm A72

BF-3 doubles the cores, upgrades the microarch (A78 vs A72), adds PCIe Gen5, and ConnectX-7 vs ConnectX-6 Dx.

🔌

What PCIe generation does BlueField-3 use?

PCIe Gen5 ×16

BlueField-2 used PCIe Gen4. Gen5 doubles bandwidth to ~64 GB/s — critical for GPUDirect RDMA and DOCA GPUNetIO workloads where DMA bandwidth is the bottleneck.

🛡️

In separated (DPU) mode, what security property holds?

Host OS ↔ DPU Arm OS: fully isolated

A compromised host cannot reach DPU management or other tenants' traffic. The DPU is the trusted enforcement point — even if the host is hostile.

📡

What is tmfifo on BlueField?

tmfifo = Terminal / Management FIFO

A virtual serial console channel between the host CPU and the DPU Arm OS over PCIe. Used for initial provisioning, console access, and rshim communication when no OOB network is available.

DOCA SDK

Data Center Infrastructure on a Chip Architecture — the unified programming framework for BlueField DPU hardware accelerators.

🛠️

What is DOCA?

DOCA (Data Center Infrastructure on a Chip Architecture) is NVIDIA's SDK for programming BlueField DPUs. It provides C APIs that abstract hardware accelerators — crypto, compress, regex, DMA, RDMA, packet processing — so developers don't need to write low-level register code. DOCA applications run on the DPU's Arm cores (or from the host for some operations).

DOCA Library Catalog

🔀

DOCA Flow

Pipe-based match-action packet processing pipeline. The core of OVS offload, firewall, and load balancer implementations.

Network

🔍

DOCA DPI

Deep Packet Inspection — L7 application identification and pattern matching using hardware Regex engine.

Security

🔒

DOCA IPsec

Inline IPsec encryption/decryption. AES-GCM at up to 400 Gbps — full line rate on BF-3 with zero host CPU.

Security

🧱

DOCA Firewall

Stateful connection tracking + ACL enforcement. Offloads iptables/nftables to DPU hardware match-action tables.

Security

📦

DOCA Compress

LZ4 and Deflate lossless compression/decompression in hardware. Used for storage data services and network payload reduction.

Storage

🎮

DOCA GPUNetIO

Receive network packets directly into GPU memory via DMA — bypassing host CPU and enabling GPU-native packet processing.

AI / GPU

⚡

DOCA RDMA

RDMA operations (read, write, send/recv) initiated from DPU Arm cores. Used for distributed storage and in-network compute.

Network

🔤

DOCA Regex

Hardware-accelerated regular expression matching. Powers DPI pattern libraries and IDS/IPS signature matching at line rate.

Security

📊

DOCA Telemetry

Collect and stream per-flow, per-port, and system-level metrics from DPU to monitoring systems (Prometheus, Grafana, gRPC).

Observability

DOCA Programming Model

doca_ctx

Context

Represents a hardware engine instance (e.g., one Compress engine, one IPsec SA). Lifecycle: create → connect to progress engine → start → use → stop → destroy.

doca_pe

Progress Engine

Drives asynchronous task completion. Poll-based (like DPDK) — call doca_pe_progress() in your event loop to harvest completed tasks without blocking.

doca_mmap

Memory Map

Registers a memory region with the DPU for DMA access. Must register any host or GPU buffer before the DPU DMA engine can read/write it.

doca_buf

Buffer

Represents a slice of a registered memory region. Allocated from a doca_buf_inventory. Passed to tasks as source/destination for DMA and crypto operations.

doca_task

Task

A unit of work submitted to a ctx — e.g., encrypt this buffer, compress this block. Submitted asynchronously; completion fires a callback via the progress engine.

doca_flow_pipe

Flow Pipe

A hardware match-action pipeline stage in DOCA Flow. Defines match fields (5-tuple, VLAN, metadata) and default actions (forward, drop, modify, meter).

⚙️ DOCA Flow Packet Processing Pipeline

Port Ingress

Packet arrives on p0/p1 or from host VF. DMA into DPU buffer.

Pipe 0: Match

5-tuple, VLAN, GRE key, metadata match in TCAM/exact tables.

Action

Modify headers, encap/decap VXLAN, set metadata, meter (rate limit).

Pipe N: Next

Chain to next pipe — e.g., ACL → NAT → firewall → counter.

Counter

Per-flow byte/packet counters. Exported via DOCA Telemetry.

Forward

Egress to port p0/p1, VF, SF, or drop. RSS for multi-queue.

DOCA Flow — Minimal Pipe Creation (C pseudocode)

// 1. Create a pipe with 5-tuple match + VXLAN encap action
struct doca_flow_pipe_cfg pipe_cfg = {
    .name = "vxlan_encap",
    .port = doca_port,              // port opened via doca_flow_port_start()
    .match = {
        .outer.ip4.dst_ip = 0xFFFFFFFF,   // match exact dst IP
        .outer.tcp.dst_port = 0xFFFF,     // match exact dst port
    },
    .actions = {
        .encap_type = DOCA_FLOW_ENCAP_VXLAN,
        .encap.tun.vxlan.tun_id = vni,
    },
    .fwd = { .type = DOCA_FLOW_FWD_PORT, .port_id = 0 },
};

// 2. Create pipe
struct doca_flow_pipe *pipe;
doca_flow_pipe_create(&pipe_cfg, NULL, NULL, &pipe);

// 3. Add an entry (specific flow)
struct doca_flow_match entry_match = {
    .outer.ip4.dst_ip = htonl(0x0a000001),   // 10.0.0.1
    .outer.tcp.dst_port = htons(8080),
};
doca_flow_pipe_add_entry(0, pipe, &entry_match, NULL, NULL, NULL, 0, NULL, &entry);

// 4. Progress engine drives async completions
while (running) {
    doca_pe_progress(pe);   // harvest callbacks, no blocking
}

📌

DOCA vs DPDK vs Kernel

DPDK gives user-space PMD access to the ConnectX NIC for packet processing. DOCA sits on top — it provides higher-level APIs for the DPU's hardware accelerators (crypto, compress, regex, RDMA) and abstracts the async task model. DOCA Flow in particular replaces hand-written DPDK rte_flow rules with a more portable pipe concept.

Security & Networking Offload

ASAP² OVS acceleration, inline IPsec, firewall offload, and Morpheus AI-driven security.

⚡ ASAP² — Accelerated Switch and Packet Processing

ASAP² offloads the Open vSwitch (OVS) forwarding datapath from the host CPU to the BlueField eSwitch (embedded switch in ConnectX-7). The result: near-zero host CPU for packet forwarding.

ASAP² Before vs After

Without ASAP² (Software OVS)

Packet: NIC → host kernel → OVS datapath → back to NIC
Host CPU: 20-30% consumed by OVS forwarding
Latency: kernel crossing ×2 per packet

→

With ASAP² (Hardware Offload)

Packet: NIC eSwitch matches flow table → forwards in hardware
Host CPU: <1% consumed (only first packet per flow)
Latency: cut-through in NIC hardware

How ASAP² Works

1. Flow table programming: OVS kernel module programs match-action rules into the BlueField eSwitch via TC flower (traffic control).
2. First packet (slow path): Unmatched packets go to host OVS userspace to install the flow rule — overhead only on first packet per connection.
3. Subsequent packets (fast path): Hardware eSwitch matches and forwards at line rate — host CPU never sees the packet.
4. Statistics: Per-flow byte/packet counters maintained in hardware, polled by OVS for telemetry.

ASAP² Capabilities

✅ L2 forwarding (MAC learning, VLAN tagging)
✅ VXLAN/NVGRE/GRE encap/decap in hardware
✅ SR-IOV VF-to-VF switching without host
✅ Connection tracking (conntrack offload)
✅ NAT (DNAT/SNAT) offload
✅ QoS meters and policing
✅ Works with DPDK OVS, OVS-DPDK, OpenStack Neutron

🔒 Inline IPsec Encryption

IPsec Offload Flow

Plaintext packet from VM / container

↓

DOCA Flow pipe matches → sends to IPsec SA

↓

Crypto HW: AES-GCM encrypt + ESP wrap

↓

Encrypted packet egresses on p0 → network

IPsec Acceleration Details
Algorithm	AES-256-GCM (AEAD)
Throughput	400 Gbps inline (BF-3)
Mode	Tunnel mode (ESP)
Host CPU cycles	Zero — fully in DPU HW
SA management	DOCA IPsec API / strongSwan
Key negotiation	IKEv2 on DPU Arm (strongSwan)
Use case	Encrypted overlay (overlay IPsec)
Standards	RFC 4303 (ESP), RFC 7296 (IKEv2)

Full Security Feature Set

🔐

Hardware Root of Trust

Secure Boot — verified UEFI + BF-OS chain
Attestation — cryptographic identity proof
Secure firmware update with signing
Hardware-fused secrets (eFUSE)

🧱

Firewall Offload

Stateful conntrack in eSwitch hardware
DOCA Firewall ACL pipeline
Per-flow allow/deny at line rate
Microsegmentation per-VM/container

🔍

Deep Packet Inspection

DOCA DPI — L7 app identification
DOCA Regex — signature matching
Hardware regex engine (not SW)
IDS/IPS signature offload

🛡️

MACsec Hardware

IEEE 802.1AE encryption at L2
Point-to-point link encryption
Zero host CPU overhead
Key rotation without traffic drop

🤖 NVIDIA Morpheus AI Security Framework

Morpheus combines BlueField DPU telemetry with GPU-accelerated AI inference to detect threats in real time — without a performance impact on application workloads.

Morpheus Data Pipeline

Network traffic
enters server

→

BlueField DPU
captures telemetry

→

DOCA Telemetry
streams to GPU

→

GPU runs AI
threat models

→

Alert / block
via DPU firewall

What Morpheus Detects

🔴 Lateral movement patterns across east-west traffic
🔴 Data exfiltration (unusual outbound volume/destination)
🔴 Port scanning and reconnaissance
🔴 Ransomware network behavior (SMB spread patterns)
🔴 DNS tunneling and covert C2 channels
🔴 Zero-day exploit traffic patterns (via DPI + ML)

Why DPU + GPU Together?

DPU advantage: Captures telemetry from all traffic — even encrypted flows — at line rate, without touching host CPU or storage.

GPU advantage: Runs transformer-based threat detection models on millions of events/second — far beyond what CPU-based SIEM can handle.

Result: Real-time detection with sub-second response, zero application impact.

Compare: SmartNIC vs DPU vs SuperNIC

Three terms that sound similar but describe meaningfully different product capabilities.

Feature	Standard SmartNIC	BlueField DPU (Separated Mode)	BlueField SuperNIC (AI Mode)
Primary purpose	NIC offload (ASIC/FPGA)	Infrastructure isolation + offload	Max AI network performance
Arm CPU cores	Few or none	16× Arm A78 — full OS	16× Arm A78 — vendor FW only
Host isolation	None	Full — host treated as tenant	None (host sees full NIC)
DOCA SDK	No	Full DOCA 2.x support	Limited (network-focused)
OVS / ASAP²	Partial	Full ASAP² + DOCA Flow	Limited
IPsec inline	Sometimes	400 Gbps hardware AES-GCM	Available
GPUDirect RDMA	Sometimes	Yes (via ConnectX-7)	Optimized — primary use case
Network speed (BF-3)	Varies	400GbE	400GbE — full line rate for AI
Use case	Basic offload	Cloud/Security/Storage	AI cluster / Spectrum-X

BlueField-2 vs BlueField-3 — Generation Comparison

Specification	BlueField-2	BlueField-3
Arm CPU	8× Arm A72 (Armv8)	16× Arm A78 (Armv8.2+)
Network speed	200GbE (2 × 100GbE)	400GbE (2 × 200GbE)
NIC controller	ConnectX-6 Dx	ConnectX-7
PCIe	Gen4 ×16	Gen5 ×16 (2× BW)
Local DRAM	Up to 16 GB DDR4	Up to 32 GB DDR5
Crypto throughput	200 Gbps inline IPsec	400 Gbps inline IPsec
SuperNIC mode	No	Yes
DOCA version	DOCA 1.x	DOCA 2.x
GPUNetIO	Limited	Full DOCA GPUNetIO
Morpheus support	Partial	Full integration

DPU Hardware vs Host CPU for Infrastructure — Resource Impact

❌ Without DPU (CPU-based Infrastructure)

• OVS software datapath: 20-30% CPU cores
• IPsec encryption (100G): 8-16 CPU cores consumed
• TLS termination: 4-8 cores for 100K TPS
• Storage erasure coding: 4-6 cores
• Total: 30-50% of host CPU unavailable for tenant workloads
• Security boundary: OS-level — vulnerable to root compromise

✅ With BlueField-3 DPU

• OVS offload (ASAP²): <1% host CPU
• IPsec 400G: 0 host CPU cores (HW crypto)
• TLS: near-zero with DPU TLS offload
• Compression: 0 host cycles (HW engine)
• Total: ~100% of host CPU for tenant workloads
• Security boundary: PCIe bus — isolated at silicon level

Practice Quiz

10 exam-style questions on BlueField DPU architecture, DOCA SDK, ASAP², and security offload.

Memory Hooks & DPU Advisor

Mnemonics for the exam, plus a guided advisor for common DPU configuration questions.

CPU + NIC + Arm = DPU

What a DPU Is

A DPU is not just a NIC — it has a full Arm CPU subsystem, hardware accelerators, and an independent OS. Think of it as a server-on-a-PCIe-card that manages the infrastructure so the host CPU doesn't have to.

ASAP² = OVS → HW

ASAP² One-Liner

ASAP² offloads Open vSwitch (OVS) forwarding from host CPU software to the BlueField eSwitch hardware. First packet: slow-path to OVS. All subsequent packets: line-rate in hardware eSwitch.

BF-3 = 16A78 + CX7 + Gen5

BlueField-3 Key Numbers

16 Arm A78 cores, ConnectX-7 NIC at 400GbE, PCIe Gen5. BF-2 was half: 8 × A72, ConnectX-6 Dx, 200GbE, Gen4.

Separated = Isolated

The DPU Mode to Know

"Separated mode" = Host OS and DPU Arm OS are fully isolated. This is the key security property that makes BlueField suitable for multi-tenant clouds — the cloud operator controls the DPU, the tenant controls only their VMs.

DOCA PE = Poll Loop

DOCA Programming Model

The Progress Engine (doca_pe) is poll-based — call doca_pe_progress() in your event loop to harvest task completions. Similar to DPDK's rx_burst model. No blocking, no interrupts.

GPUNetIO = NIC → GPU Direct

GPU Packet Processing

DOCA GPUNetIO allows the NIC DMA engine to write incoming packets directly into GPU memory. The GPU CUDA kernel processes packets without the CPU ever touching them. Used for line-rate AI inference on network traffic.

Morpheus = DPU Telemetry + GPU AI

AI Security Framework

Morpheus: DPU captures all traffic telemetry at line rate → streams to GPU → GPU runs transformer threat-detection models → DPU enforces block/alert. No host CPU involved in either capture or inference.

DOCA Flow = Pipe Chain

Packet Pipeline

DOCA Flow processes packets through chained pipes. Each pipe has match fields (5-tuple, VLAN, metadata) and actions (encap, modify, meter, drop, forward). Think of it as a programmable hardware switch pipeline.

🤖 DPU Advisor

Select your scenario for targeted guidance.

What are you working on with BlueField?

🖧 ASAP² OVS Offload Setup

Verify BlueField is in Separated (DPU) mode: mlxconfig -d /dev/mst/mt41686_pciconf0 q | grep INTERNAL_CPU_MODEL — should return EMBEDDED_CPU(1).
Install OVS with DOCA/ASAP² support: ensure openvswitch package from MLNX-OFED is used (not distro OVS — it won't have offload support).
Create OVS bridge and add VF representor ports: ovs-vsctl add-br br0 && ovs-vsctl add-port br0 pf0hpf && ovs-vsctl add-port br0 pf0vf0.
Enable hardware offload: ovs-vsctl set Open_vSwitch . other_config:hw-offload=true then restart OVS.
Validate with: ovs-appctl dpctl/dump-flows type=offloaded — flows with hardware offload active will appear.
Monitor CPU usage: OVS-offloaded flows should show near-zero softirq CPU in top under heavy traffic.
For VXLAN offload: create VXLAN tunnel port in OVS — ASAP² will handle encap/decap in eSwitch hardware automatically.

🔒 IPsec & Firewall Offload

For inline IPsec: use DOCA IPsec library to create Security Associations (SAs) — each SA binds a source/dest IP pair to an AES-256-GCM key. Install SA into ConnectX-7 hardware crypto engine.
IKEv2 key negotiation runs on BF Arm cores via strongSwan or Libreswan configured on the DPU OS — not on the host.
For firewall offload: deploy DOCA Firewall pipeline. Program ACL rules as DOCA Flow pipe entries. First matching rule wins (priority-based).
Enable conntrack offload: ovs-vsctl set Open_vSwitch . other_config:ct-size=... — offloads connection state table to eSwitch hardware.
For DPI (L7 inspection): load DOCA DPI context on BF Arm, attach to DOCA Flow miss path — only unclassified flows go to Arm for inspection.
Verify IPsec is hardware-offloaded: ip -s xfrm state on host — look for offload hw flag in SA output.
MACsec: configure via ip macsec commands — BlueField handles encryption/decryption per-port in ConnectX-7 hardware at zero CPU cost.

🛠️ DOCA SDK Development Start

Install DOCA on BF Arm: use the DOCA container image from NGC (nvcr.io/nvidia/doca/doca) or the DEB package from NVIDIA's developer portal.
Core DOCA lifecycle for any library: doca_[lib]_create() → doca_ctx_dev_add() → doca_pe_create() → doca_ctx_set_pe() → doca_ctx_start() → submit tasks → doca_pe_progress() loop.
Register all memory before DMA access: doca_mmap_create() → doca_mmap_set_memrange() → doca_mmap_start() — required for both host and DPU memory regions.
For DOCA Flow: always call doca_flow_init() before any port/pipe operations. Start ports before creating pipes. Create pipe before adding entries.
Sample applications in /opt/mellanox/doca/samples/ on the BF Arm — start with doca_compress or doca_flow_drop before moving to complex pipelines.
Debug tip: enable DOCA logging with doca_log_level_set_global(DOCA_LOG_LEVEL_DEBUG) at program start to see per-library trace output.

🤖 Morpheus AI Security Deployment

Morpheus requires: BlueField-3 DPU (separated mode) + NVIDIA GPU in the same server + DOCA Telemetry Service (DTS) on the DPU Arm.
Install DOCA Telemetry Service on BF: configure it to capture per-flow metadata (5-tuple, byte counts, timestamps, DPI labels) and stream via gRPC to the Morpheus pipeline.
Deploy Morpheus pipeline on the host GPU using the Morpheus SDK container from NGC: nvcr.io/nvidia/morpheus/morpheus.
Configure the Morpheus pipeline with the desired AI model (e.g., Anomalous Behavior Detection ABP model, digital fingerprinting DFP model for lateral movement).
Morpheus output → response actions via DPU firewall: program a DOCA Flow pipe entry to drop or redirect suspect flows based on Morpheus-generated IP/flow tags.
Scale: each BlueField-3 can monitor all traffic on its two 200GbE ports (~400 Gbps) with zero host CPU — a single DGX node with 8 BlueField cards covers 3.2 Tbps of total server ingress/egress.

BlueField DPUs& DOCA SDK

DPU = CPU + NIC + SmartNIC in one chip

Network Offload

Security Acceleration

Storage Acceleration

The Three-Processor Data Center Model

CPU

GPU

DPU (BlueField)

DPU Use Cases by Deployment

Cloud / Hyperscale

AI / HPC Clusters

🧩 BlueField-3 DPU Block Diagram

BlueField-3 Key Specifications

Key Interfaces

SF vs VF

MODE 1 NIC Mode (Transparent Bridge)

MODE 2 Embedded (SmartNIC / Arm-Controlled)

MODE 3 Separated (DPU Mode — Isolated)

MODE 4 Restricted Host (Tenant Security Mode)

Architecture Flash Cards — Click to Flip

What is DOCA?

DOCA Library Catalog

DOCA Flow

DOCA DPI

DOCA IPsec

DOCA Firewall

DOCA Compress

DOCA GPUNetIO

DOCA RDMA

DOCA Regex

DOCA Telemetry

DOCA Programming Model

Context

Progress Engine

Memory Map

Buffer

Task

Flow Pipe

⚙️ DOCA Flow Packet Processing Pipeline

Port Ingress

Pipe 0: Match

Action

Pipe N: Next

Counter

Forward

DOCA Flow — Minimal Pipe Creation (C pseudocode)

DOCA vs DPDK vs Kernel

⚡ ASAP² — Accelerated Switch and Packet Processing

ASAP² Before vs After

Without ASAP² (Software OVS)

With ASAP² (Hardware Offload)

How ASAP² Works

ASAP² Capabilities

🔒 Inline IPsec Encryption

IPsec Offload Flow

Full Security Feature Set

Hardware Root of Trust

Firewall Offload

Deep Packet Inspection

MACsec Hardware

🤖 NVIDIA Morpheus AI Security Framework

Morpheus Data Pipeline

What Morpheus Detects

Why DPU + GPU Together?

BlueField-2 vs BlueField-3 — Generation Comparison

DPU Hardware vs Host CPU for Infrastructure — Resource Impact

❌ Without DPU (CPU-based Infrastructure)

✅ With BlueField-3 DPU

🤖 DPU Advisor

What are you working on with BlueField?

🖧 ASAP² OVS Offload Setup

🔒 IPsec & Firewall Offload

🛠️ DOCA SDK Development Start

🤖 Morpheus AI Security Deployment

One Topic Left — Finish Strong

BlueField DPUs
& DOCA SDK