NCP-AIN · Topic 4 · SmartNIC & DPU

BlueField DPUs
& DOCA SDK

SmartNIC architecture, operating modes, DOCA programming model, ASAP² OVS offload, inline IPsec, GPUNetIO, and Morpheus AI security — the full DPU stack.

16
Arm Cores (BF-3)
9+
DOCA Libraries
4
Operating Modes
10
Practice Questions

What is a DPU?
The third pillar of data center compute — alongside CPU and GPU — that offloads and isolates infrastructure from application workloads.
💡

DPU = CPU + NIC + SmartNIC in one chip

A Data Processing Unit combines a high-performance network interface (ConnectX-7 at 400GbE) with general-purpose Arm compute cores and dedicated hardware accelerators for crypto, compression, and regex — all on one PCIe card. The key insight: run the infrastructure software (firewalls, OVS, storage, encryption) on the DPU's Arm cores, freeing the host CPU exclusively for application workloads.

🖧

Network Offload

Replace the host CPU's software networking stack with hardware acceleration on the DPU.

  • OVS (Open vSwitch) via ASAP²
  • SR-IOV virtual functions
  • VXLAN/NVGRE encap/decap
  • DPDK datapath acceleration
  • Traffic shaping & metering
🔒

Security Acceleration

Run security infrastructure isolated from tenant workloads — tenants cannot tamper with it.

  • Inline IPsec encryption (100 Gbps+)
  • TLS termination offload
  • MACsec hardware offload
  • Deep Packet Inspection (DPI)
  • Firewall with connection tracking
💾

Storage Acceleration

NVMe-over-Fabrics target/initiator and data services accelerated in DPU hardware.

  • NVMe-oF target (SNAP)
  • Erasure coding offload
  • Lossless compression (LZ4)
  • Inline data encryption
  • GPUDirect Storage (GDS)

The Three-Processor Data Center Model

🧠

CPU

Application logic — web servers, databases, AI inference, business logic. Should spend zero cycles on infrastructure plumbing.

GPU

Massively parallel compute — AI training, inference, HPC simulations, rendering. Needs maximum memory bandwidth for tensor ops.

🧩

DPU (BlueField)

Infrastructure services — networking, security, storage, telemetry. Isolated from tenants. Programmable via DOCA SDK.

DPU Use Cases by Deployment

Cloud / Hyperscale

✅ Offload OVS from host CPU (ASAP²) — reclaim 10-30% CPU for tenant VMs
✅ Isolated management plane — cloud operator controls DPU, tenant cannot
✅ Inline IPsec for encrypted overlay networks (zero host CPU overhead)
✅ NVMe-oF SNAP — emulate local NVMe disk backed by network storage
✅ Zero-trust microsegmentation at the NIC level

AI / HPC Clusters

✅ SuperNIC mode — 400GbE RoCEv2 for NCCL AllReduce (Topic 3)
✅ GPUDirect RDMA — GPU mem ↔ network with zero CPU cycles
✅ DOCA GPUNetIO — receive packets directly into GPU memory
✅ Morpheus AI security — DPU captures telemetry, GPU runs threat models
✅ Telemetry offload — collect per-flow stats without touching host CPU

BlueField-3 Internal Architecture
What's inside the chip, how the components connect, and the four operating modes.

🧩 BlueField-3 DPU Block Diagram

Network Interface (ConnectX-7 Controller)
400GbE Port 0
200GbE × 2
400GbE Port 1
200GbE × 2
RoCEv2 HW
RDMA offload
eSwitch
ASAP² vSwitch
Hardware Accelerators
Crypto
AES-GCM, SHA
Compress
LZ4 / Deflate
Regex
DPI patterns
DOCA Flow
Match-action
DMA Engine
P2P / GPUDirect
Arm Compute Subsystem
16× Arm A78
Armv8.2+
16 GB DDR5
Local DRAM
L3 Cache
Shared
BF-OS
Yocto Linux
Host Connectivity
PCIe Gen5 ×16
Host interface
OOB Mgmt
1GbE port
BMC Interface
IPMI / Redfish
Secure Boot
Root of Trust

BlueField-3 Key Specifications

BlueField-3 Specs
Arm cores16× Arm A78 (Armv8.2+)
Network speed400GbE (2 × 200GbE)
NIC controllerConnectX-7
PCIe generationGen5 ×16
Local DRAMUp to 32 GB DDR5
Crypto throughput400 Gbps inline IPsec
CompressionHardware LZ4/Deflate
Management portOut-of-band 1GbE
OSBF-OS (Yocto Linux on Arm)
SDKDOCA 2.x

Key Interfaces

p0/p1: Network-facing ports (uplink to switch)
pf0/pf1: Physical Functions exposed to host via PCIe
vf0, vf1…: Virtual Functions for SR-IOV VMs
sf0, sf1…: Scalable Functions (lightweight VF replacement)
tmfifo: Console/management channel between Arm and host
oob_net: Out-of-band management port (independent of p0/p1)
💡

SF vs VF

Scalable Functions (SF) are a lightweight alternative to VFs — they don't require PCIe function enumeration, making them faster to create/destroy for containerized workloads (Kubernetes pods).

Four Operating Modes
The same BlueField-3 hardware behaves very differently depending on the configured mode.

MODE 1 NIC Mode (Transparent Bridge)

BlueField acts as a standard high-performance NIC. Arm cores are idle or running minimal firmware. Host sees a ConnectX-7 device — full 400GbE bandwidth available. DOCA not active.

✅ Use when: You only need raw NIC performance with zero offload complexity. Pure throughput workloads.

MODE 2 Embedded (SmartNIC / Arm-Controlled)

Arm cores run DOCA applications that share the NIC with the host. Host and Arm both see the network interface. Good for in-line telemetry and monitoring without full isolation.

✅ Use when: Adding network telemetry or lightweight packet processing alongside normal host NIC usage.

MODE 3 Separated (DPU Mode — Isolated)

The primary DPU deployment mode. Host OS and DPU Arm OS are fully isolated — they cannot access each other's memory or management plane. The Arm runs a complete Yocto Linux with DOCA services. Host sees only the VFs/SFs the DPU grants it.

✅ Use when: Cloud provider operating infrastructure independently from tenant. Zero-trust model. Morpheus security. Full offload.

MODE 4 Restricted Host (Tenant Security Mode)

Host is treated as an untrusted tenant. DPU Arm controls all network policy. Even if the host OS is compromised, the attacker cannot modify network policy or access other tenants' traffic. Strongest security isolation.

✅ Use when: Multi-tenant cloud where the host itself is untrusted. Government/financial workloads. Confidential computing.

Architecture Flash Cards — Click to Flip

🧩

How many Arm cores in BlueField-3 vs BlueField-2?

BF-3: 16× Arm A78
BF-2: 8× Arm A72

BF-3 doubles the cores, upgrades the microarch (A78 vs A72), adds PCIe Gen5, and ConnectX-7 vs ConnectX-6 Dx.

🔌

What PCIe generation does BlueField-3 use?

PCIe Gen5 ×16

BlueField-2 used PCIe Gen4. Gen5 doubles bandwidth to ~64 GB/s — critical for GPUDirect RDMA and DOCA GPUNetIO workloads where DMA bandwidth is the bottleneck.

🛡️

In separated (DPU) mode, what security property holds?

Host OS ↔ DPU Arm OS: fully isolated

A compromised host cannot reach DPU management or other tenants' traffic. The DPU is the trusted enforcement point — even if the host is hostile.

📡

What is tmfifo on BlueField?

tmfifo = Terminal / Management FIFO

A virtual serial console channel between the host CPU and the DPU Arm OS over PCIe. Used for initial provisioning, console access, and rshim communication when no OOB network is available.


DOCA SDK
Data Center Infrastructure on a Chip Architecture — the unified programming framework for BlueField DPU hardware accelerators.
🛠️

What is DOCA?

DOCA (Data Center Infrastructure on a Chip Architecture) is NVIDIA's SDK for programming BlueField DPUs. It provides C APIs that abstract hardware accelerators — crypto, compress, regex, DMA, RDMA, packet processing — so developers don't need to write low-level register code. DOCA applications run on the DPU's Arm cores (or from the host for some operations).

DOCA Library Catalog

🔀

DOCA Flow

Pipe-based match-action packet processing pipeline. The core of OVS offload, firewall, and load balancer implementations.

Network
🔍

DOCA DPI

Deep Packet Inspection — L7 application identification and pattern matching using hardware Regex engine.

Security
🔒

DOCA IPsec

Inline IPsec encryption/decryption. AES-GCM at up to 400 Gbps — full line rate on BF-3 with zero host CPU.

Security
🧱

DOCA Firewall

Stateful connection tracking + ACL enforcement. Offloads iptables/nftables to DPU hardware match-action tables.

Security
📦

DOCA Compress

LZ4 and Deflate lossless compression/decompression in hardware. Used for storage data services and network payload reduction.

Storage
🎮

DOCA GPUNetIO

Receive network packets directly into GPU memory via DMA — bypassing host CPU and enabling GPU-native packet processing.

AI / GPU

DOCA RDMA

RDMA operations (read, write, send/recv) initiated from DPU Arm cores. Used for distributed storage and in-network compute.

Network
🔤

DOCA Regex

Hardware-accelerated regular expression matching. Powers DPI pattern libraries and IDS/IPS signature matching at line rate.

Security
📊

DOCA Telemetry

Collect and stream per-flow, per-port, and system-level metrics from DPU to monitoring systems (Prometheus, Grafana, gRPC).

Observability

DOCA Programming Model

doca_ctx

Context

Represents a hardware engine instance (e.g., one Compress engine, one IPsec SA). Lifecycle: create → connect to progress engine → start → use → stop → destroy.

doca_pe

Progress Engine

Drives asynchronous task completion. Poll-based (like DPDK) — call doca_pe_progress() in your event loop to harvest completed tasks without blocking.

doca_mmap

Memory Map

Registers a memory region with the DPU for DMA access. Must register any host or GPU buffer before the DPU DMA engine can read/write it.

doca_buf

Buffer

Represents a slice of a registered memory region. Allocated from a doca_buf_inventory. Passed to tasks as source/destination for DMA and crypto operations.

doca_task

Task

A unit of work submitted to a ctx — e.g., encrypt this buffer, compress this block. Submitted asynchronously; completion fires a callback via the progress engine.

doca_flow_pipe

Flow Pipe

A hardware match-action pipeline stage in DOCA Flow. Defines match fields (5-tuple, VLAN, metadata) and default actions (forward, drop, modify, meter).

⚙️ DOCA Flow Packet Processing Pipeline

Port Ingress

Packet arrives on p0/p1 or from host VF. DMA into DPU buffer.

Pipe 0: Match

5-tuple, VLAN, GRE key, metadata match in TCAM/exact tables.

Action

Modify headers, encap/decap VXLAN, set metadata, meter (rate limit).

Pipe N: Next

Chain to next pipe — e.g., ACL → NAT → firewall → counter.

Counter

Per-flow byte/packet counters. Exported via DOCA Telemetry.

Forward

Egress to port p0/p1, VF, SF, or drop. RSS for multi-queue.

DOCA Flow — Minimal Pipe Creation (C pseudocode)

// 1. Create a pipe with 5-tuple match + VXLAN encap action
struct doca_flow_pipe_cfg pipe_cfg = {
    .name = "vxlan_encap",
    .port = doca_port,              // port opened via doca_flow_port_start()
    .match = {
        .outer.ip4.dst_ip = 0xFFFFFFFF,   // match exact dst IP
        .outer.tcp.dst_port = 0xFFFF,     // match exact dst port
    },
    .actions = {
        .encap_type = DOCA_FLOW_ENCAP_VXLAN,
        .encap.tun.vxlan.tun_id = vni,
    },
    .fwd = { .type = DOCA_FLOW_FWD_PORT, .port_id = 0 },
};

// 2. Create pipe
struct doca_flow_pipe *pipe;
doca_flow_pipe_create(&pipe_cfg, NULL, NULL, &pipe);

// 3. Add an entry (specific flow)
struct doca_flow_match entry_match = {
    .outer.ip4.dst_ip = htonl(0x0a000001),   // 10.0.0.1
    .outer.tcp.dst_port = htons(8080),
};
doca_flow_pipe_add_entry(0, pipe, &entry_match, NULL, NULL, NULL, 0, NULL, &entry);

// 4. Progress engine drives async completions
while (running) {
    doca_pe_progress(pe);   // harvest callbacks, no blocking
}
📌

DOCA vs DPDK vs Kernel

DPDK gives user-space PMD access to the ConnectX NIC for packet processing. DOCA sits on top — it provides higher-level APIs for the DPU's hardware accelerators (crypto, compress, regex, RDMA) and abstracts the async task model. DOCA Flow in particular replaces hand-written DPDK rte_flow rules with a more portable pipe concept.


Security & Networking Offload
ASAP² OVS acceleration, inline IPsec, firewall offload, and Morpheus AI-driven security.

⚡ ASAP² — Accelerated Switch and Packet Processing

ASAP² offloads the Open vSwitch (OVS) forwarding datapath from the host CPU to the BlueField eSwitch (embedded switch in ConnectX-7). The result: near-zero host CPU for packet forwarding.

ASAP² Before vs After

Without ASAP² (Software OVS)

Packet: NIC → host kernel → OVS datapath → back to NIC
Host CPU: 20-30% consumed by OVS forwarding
Latency: kernel crossing ×2 per packet

With ASAP² (Hardware Offload)

Packet: NIC eSwitch matches flow table → forwards in hardware
Host CPU: <1% consumed (only first packet per flow)
Latency: cut-through in NIC hardware

How ASAP² Works

1. Flow table programming: OVS kernel module programs match-action rules into the BlueField eSwitch via TC flower (traffic control).
2. First packet (slow path): Unmatched packets go to host OVS userspace to install the flow rule — overhead only on first packet per connection.
3. Subsequent packets (fast path): Hardware eSwitch matches and forwards at line rate — host CPU never sees the packet.
4. Statistics: Per-flow byte/packet counters maintained in hardware, polled by OVS for telemetry.

ASAP² Capabilities

✅ L2 forwarding (MAC learning, VLAN tagging)
✅ VXLAN/NVGRE/GRE encap/decap in hardware
✅ SR-IOV VF-to-VF switching without host
✅ Connection tracking (conntrack offload)
✅ NAT (DNAT/SNAT) offload
✅ QoS meters and policing
✅ Works with DPDK OVS, OVS-DPDK, OpenStack Neutron

🔒 Inline IPsec Encryption

IPsec Offload Flow

Plaintext packet from VM / container
DOCA Flow pipe matches → sends to IPsec SA
Crypto HW: AES-GCM encrypt + ESP wrap
Encrypted packet egresses on p0 → network
IPsec Acceleration Details
AlgorithmAES-256-GCM (AEAD)
Throughput400 Gbps inline (BF-3)
ModeTunnel mode (ESP)
Host CPU cyclesZero — fully in DPU HW
SA managementDOCA IPsec API / strongSwan
Key negotiationIKEv2 on DPU Arm (strongSwan)
Use caseEncrypted overlay (overlay IPsec)
StandardsRFC 4303 (ESP), RFC 7296 (IKEv2)

Full Security Feature Set

🔐

Hardware Root of Trust

  • Secure Boot — verified UEFI + BF-OS chain
  • Attestation — cryptographic identity proof
  • Secure firmware update with signing
  • Hardware-fused secrets (eFUSE)
🧱

Firewall Offload

  • Stateful conntrack in eSwitch hardware
  • DOCA Firewall ACL pipeline
  • Per-flow allow/deny at line rate
  • Microsegmentation per-VM/container
🔍

Deep Packet Inspection

  • DOCA DPI — L7 app identification
  • DOCA Regex — signature matching
  • Hardware regex engine (not SW)
  • IDS/IPS signature offload
🛡️

MACsec Hardware

  • IEEE 802.1AE encryption at L2
  • Point-to-point link encryption
  • Zero host CPU overhead
  • Key rotation without traffic drop

🤖 NVIDIA Morpheus AI Security Framework

Morpheus combines BlueField DPU telemetry with GPU-accelerated AI inference to detect threats in real time — without a performance impact on application workloads.

Morpheus Data Pipeline

Network traffic
enters server
BlueField DPU
captures telemetry
DOCA Telemetry
streams to GPU
GPU runs AI
threat models
Alert / block
via DPU firewall

What Morpheus Detects

🔴 Lateral movement patterns across east-west traffic
🔴 Data exfiltration (unusual outbound volume/destination)
🔴 Port scanning and reconnaissance
🔴 Ransomware network behavior (SMB spread patterns)
🔴 DNS tunneling and covert C2 channels
🔴 Zero-day exploit traffic patterns (via DPI + ML)

Why DPU + GPU Together?

DPU advantage: Captures telemetry from all traffic — even encrypted flows — at line rate, without touching host CPU or storage.

GPU advantage: Runs transformer-based threat detection models on millions of events/second — far beyond what CPU-based SIEM can handle.

Result: Real-time detection with sub-second response, zero application impact.

Compare: SmartNIC vs DPU vs SuperNIC
Three terms that sound similar but describe meaningfully different product capabilities.
Feature Standard SmartNIC BlueField DPU (Separated Mode) BlueField SuperNIC (AI Mode)
Primary purpose NIC offload (ASIC/FPGA) Infrastructure isolation + offload Max AI network performance
Arm CPU cores Few or none 16× Arm A78 — full OS 16× Arm A78 — vendor FW only
Host isolation None Full — host treated as tenant None (host sees full NIC)
DOCA SDK No Full DOCA 2.x support Limited (network-focused)
OVS / ASAP² Partial Full ASAP² + DOCA Flow Limited
IPsec inline Sometimes 400 Gbps hardware AES-GCM Available
GPUDirect RDMA Sometimes Yes (via ConnectX-7) Optimized — primary use case
Network speed (BF-3) Varies 400GbE 400GbE — full line rate for AI
Use case Basic offload Cloud/Security/Storage AI cluster / Spectrum-X

BlueField-2 vs BlueField-3 — Generation Comparison

Specification BlueField-2 BlueField-3
Arm CPU8× Arm A72 (Armv8)16× Arm A78 (Armv8.2+)
Network speed200GbE (2 × 100GbE)400GbE (2 × 200GbE)
NIC controllerConnectX-6 DxConnectX-7
PCIeGen4 ×16Gen5 ×16 (2× BW)
Local DRAMUp to 16 GB DDR4Up to 32 GB DDR5
Crypto throughput200 Gbps inline IPsec400 Gbps inline IPsec
SuperNIC modeNoYes
DOCA versionDOCA 1.xDOCA 2.x
GPUNetIOLimitedFull DOCA GPUNetIO
Morpheus supportPartialFull integration

DPU Hardware vs Host CPU for Infrastructure — Resource Impact

❌ Without DPU (CPU-based Infrastructure)

• OVS software datapath: 20-30% CPU cores
• IPsec encryption (100G): 8-16 CPU cores consumed
• TLS termination: 4-8 cores for 100K TPS
• Storage erasure coding: 4-6 cores
• Total: 30-50% of host CPU unavailable for tenant workloads
• Security boundary: OS-level — vulnerable to root compromise

✅ With BlueField-3 DPU

• OVS offload (ASAP²): <1% host CPU
• IPsec 400G: 0 host CPU cores (HW crypto)
• TLS: near-zero with DPU TLS offload
• Compression: 0 host cycles (HW engine)
• Total: ~100% of host CPU for tenant workloads
• Security boundary: PCIe bus — isolated at silicon level

Practice Quiz
10 exam-style questions on BlueField DPU architecture, DOCA SDK, ASAP², and security offload.


Memory Hooks & DPU Advisor
Mnemonics for the exam, plus a guided advisor for common DPU configuration questions.
CPU + NIC + Arm = DPU
What a DPU Is

A DPU is not just a NIC — it has a full Arm CPU subsystem, hardware accelerators, and an independent OS. Think of it as a server-on-a-PCIe-card that manages the infrastructure so the host CPU doesn't have to.

ASAP² = OVS → HW
ASAP² One-Liner

ASAP² offloads Open vSwitch (OVS) forwarding from host CPU software to the BlueField eSwitch hardware. First packet: slow-path to OVS. All subsequent packets: line-rate in hardware eSwitch.

BF-3 = 16A78 + CX7 + Gen5
BlueField-3 Key Numbers

16 Arm A78 cores, ConnectX-7 NIC at 400GbE, PCIe Gen5. BF-2 was half: 8 × A72, ConnectX-6 Dx, 200GbE, Gen4.

Separated = Isolated
The DPU Mode to Know

"Separated mode" = Host OS and DPU Arm OS are fully isolated. This is the key security property that makes BlueField suitable for multi-tenant clouds — the cloud operator controls the DPU, the tenant controls only their VMs.

DOCA PE = Poll Loop
DOCA Programming Model

The Progress Engine (doca_pe) is poll-based — call doca_pe_progress() in your event loop to harvest task completions. Similar to DPDK's rx_burst model. No blocking, no interrupts.

GPUNetIO = NIC → GPU Direct
GPU Packet Processing

DOCA GPUNetIO allows the NIC DMA engine to write incoming packets directly into GPU memory. The GPU CUDA kernel processes packets without the CPU ever touching them. Used for line-rate AI inference on network traffic.

Morpheus = DPU Telemetry + GPU AI
AI Security Framework

Morpheus: DPU captures all traffic telemetry at line rate → streams to GPU → GPU runs transformer threat-detection models → DPU enforces block/alert. No host CPU involved in either capture or inference.

DOCA Flow = Pipe Chain
Packet Pipeline

DOCA Flow processes packets through chained pipes. Each pipe has match fields (5-tuple, VLAN, metadata) and actions (encap, modify, meter, drop, forward). Think of it as a programmable hardware switch pipeline.

🤖 DPU Advisor

Select your scenario for targeted guidance.

What are you working on with BlueField?

🖧 ASAP² OVS Offload Setup

  • Verify BlueField is in Separated (DPU) mode: mlxconfig -d /dev/mst/mt41686_pciconf0 q | grep INTERNAL_CPU_MODEL — should return EMBEDDED_CPU(1).
  • Install OVS with DOCA/ASAP² support: ensure openvswitch package from MLNX-OFED is used (not distro OVS — it won't have offload support).
  • Create OVS bridge and add VF representor ports: ovs-vsctl add-br br0 && ovs-vsctl add-port br0 pf0hpf && ovs-vsctl add-port br0 pf0vf0.
  • Enable hardware offload: ovs-vsctl set Open_vSwitch . other_config:hw-offload=true then restart OVS.
  • Validate with: ovs-appctl dpctl/dump-flows type=offloaded — flows with hardware offload active will appear.
  • Monitor CPU usage: OVS-offloaded flows should show near-zero softirq CPU in top under heavy traffic.
  • For VXLAN offload: create VXLAN tunnel port in OVS — ASAP² will handle encap/decap in eSwitch hardware automatically.

🔒 IPsec & Firewall Offload

  • For inline IPsec: use DOCA IPsec library to create Security Associations (SAs) — each SA binds a source/dest IP pair to an AES-256-GCM key. Install SA into ConnectX-7 hardware crypto engine.
  • IKEv2 key negotiation runs on BF Arm cores via strongSwan or Libreswan configured on the DPU OS — not on the host.
  • For firewall offload: deploy DOCA Firewall pipeline. Program ACL rules as DOCA Flow pipe entries. First matching rule wins (priority-based).
  • Enable conntrack offload: ovs-vsctl set Open_vSwitch . other_config:ct-size=... — offloads connection state table to eSwitch hardware.
  • For DPI (L7 inspection): load DOCA DPI context on BF Arm, attach to DOCA Flow miss path — only unclassified flows go to Arm for inspection.
  • Verify IPsec is hardware-offloaded: ip -s xfrm state on host — look for offload hw flag in SA output.
  • MACsec: configure via ip macsec commands — BlueField handles encryption/decryption per-port in ConnectX-7 hardware at zero CPU cost.

🛠️ DOCA SDK Development Start

  • Install DOCA on BF Arm: use the DOCA container image from NGC (nvcr.io/nvidia/doca/doca) or the DEB package from NVIDIA's developer portal.
  • Core DOCA lifecycle for any library: doca_[lib]_create()doca_ctx_dev_add()doca_pe_create()doca_ctx_set_pe()doca_ctx_start() → submit tasks → doca_pe_progress() loop.
  • Register all memory before DMA access: doca_mmap_create()doca_mmap_set_memrange()doca_mmap_start() — required for both host and DPU memory regions.
  • For DOCA Flow: always call doca_flow_init() before any port/pipe operations. Start ports before creating pipes. Create pipe before adding entries.
  • Sample applications in /opt/mellanox/doca/samples/ on the BF Arm — start with doca_compress or doca_flow_drop before moving to complex pipelines.
  • Debug tip: enable DOCA logging with doca_log_level_set_global(DOCA_LOG_LEVEL_DEBUG) at program start to see per-library trace output.

🤖 Morpheus AI Security Deployment

  • Morpheus requires: BlueField-3 DPU (separated mode) + NVIDIA GPU in the same server + DOCA Telemetry Service (DTS) on the DPU Arm.
  • Install DOCA Telemetry Service on BF: configure it to capture per-flow metadata (5-tuple, byte counts, timestamps, DPI labels) and stream via gRPC to the Morpheus pipeline.
  • Deploy Morpheus pipeline on the host GPU using the Morpheus SDK container from NGC: nvcr.io/nvidia/morpheus/morpheus.
  • Configure the Morpheus pipeline with the desired AI model (e.g., Anomalous Behavior Detection ABP model, digital fingerprinting DFP model for lateral movement).
  • Morpheus output → response actions via DPU firewall: program a DOCA Flow pipe entry to drop or redirect suspect flows based on Morpheus-generated IP/flow tags.
  • Scale: each BlueField-3 can monitor all traffic on its two 200GbE ports (~400 Gbps) with zero host CPU — a single DGX node with 8 BlueField cards covers 3.2 Tbps of total server ingress/egress.

One Topic Left — Finish Strong

Topic 4 complete. Continue to Topic 5: AI Cluster Orchestration & Observability.

Start Free on FlashGenius