NVIDIA NCP-AAI Exam Prep · Topic 2 of 5

Agent Development & Frameworks

Master LangChain, LangGraph, AutoGen, and CrewAI — plus prompt engineering for agents, tool schema design, structured output, reliability patterns, and multimodal agent development.

Start Free Practice →
Four Pillars of Agent Development & Frameworks
Agent Development covers 15% of the NCP-AAI exam — the largest single domain. Mastering the four major frameworks, prompt engineering for agents, tool schema design, and multimodal patterns is essential for passing.
Pillar 1 · Agent Frameworks

LangChain · LangGraph · AutoGen · CrewAI

The four dominant open-source frameworks each take a different approach. LangChain provides composable building blocks. LangGraph adds stateful graph execution with loops and branching. AutoGen enables conversational multi-agent patterns. CrewAI offers a role-based, opinionated structure.

4
Major frameworks
DAG
LangGraph model
LCEL
LangChain syntax
Pillar 2 · Prompt Engineering for Agents

System Prompts · Tool Descriptions · Output Format

Agent system prompts define role, available tools, constraints, and output format. Tool description quality is the single highest-leverage variable in agent reliability — the LLM selects tools based entirely on their description text. Structured output enforces type-safe, parseable responses.

4
Prompt sections
#1
Tool desc leverage
JSON
Structured output
Pillar 3 · Tool Integration & Reliability

Schemas · Retry · Validation · Parallel Calls

Production agents fail when tools fail. Robust tool integration requires well-typed schemas, retry logic with exponential backoff, output validation before passing results to the LLM, and parallel tool calling when multiple tools are independent. Error messages must be LLM-readable.

Retry attempts
Parallel calls
Pydantic
Validation library
Pillar 4 · Multimodal & Advanced Agents

Vision · Code Execution · Audio · NVIDIA NIM

Modern agents go beyond text — they inspect images, execute code in sandboxes, process audio, and call specialized model microservices. NVIDIA NIM packages optimized models as API endpoints agents can call as tools, dramatically simplifying multimodal agent deployment on NVIDIA hardware.

4
Modality types
NIM
NVIDIA microservice
E2B
Code sandbox

The Four Major Agentic AI Frameworks — At a Glance

LangChain
Building Blocks
Composable chains, tools, memory, and LCEL (LangChain Expression Language) piping. The foundational toolkit — LangGraph is built on top of it.
Best for: Rapid prototyping, simple chains, retrieval pipelines
LangGraph
Stateful Graphs
Graph-based execution with shared State, conditional edges, loops, and human-in-the-loop interrupts. The go-to for complex multi-step agents.
Best for: Complex workflows, multi-agent orchestration, HITL
AutoGen
Conversational
ConversableAgents exchange messages in a GroupChat. Supports code execution, human-in-the-loop, and dynamic multi-agent conversations out of the box.
Best for: Two-agent patterns, code generation, dynamic chat
CrewAI
Role-Based
Agents have a role, goal, and backstory. Tasks are assigned to a Crew with sequential or hierarchical process. Opinionated and beginner-friendly.
Best for: Structured pipelines, role-based teams, quick setup
Exam tip: The NCP-AAI exam tests practical understanding of each framework — not just names. Know when to choose LangGraph over LangChain (stateful loops needed), AutoGen over CrewAI (dynamic conversation vs. structured pipeline), and how NVIDIA NIM integrates with each as a drop-in tool endpoint.
How Agent Development Works
From framework internals through prompt design to tool schemas and multimodal integration.

LangGraph: Stateful Graph Execution

LangGraph StateGraph — Nodes, Edges, and Shared State

START
Entry point. Initializes the shared State object — a TypedDict that all nodes read from and write to.
Agent Node
LLM call. Reads State, decides next action, writes Thought + tool call to State. Every node is a plain Python function.
Conditional Edge
Inspects State to decide which node runs next: Tool Node (if a tool was called) or END (if final answer is ready). Enables branching and loops.
Tool Node
Executes the requested tool call, writes the Observation back to State, then returns to the Agent Node for the next reasoning step.
END
Terminal node. Final answer is extracted from State and returned to the caller. Checkpointer persists State for resumable workflows.
Key LangGraph concepts: StateGraph defines the graph · add_node() registers functions · add_conditional_edges() routes based on State · MemorySaver checkpointer enables human-in-the-loop interrupts and session persistence.

Agent System Prompt Anatomy

A Well-Structured Agent System Prompt Has Four Sections

# ── ROLE ──────────────────────────────────────────────────
You are a customer support agent for Acme SaaS.
You help users with billing, account, and technical questions.

# ── TOOLS ─────────────────────────────────────────────────
You have access to the following tools:
- get_account(user_id): Returns account plan, status, and billing info
- get_invoice(invoice_id): Returns invoice details and line items
- search_kb(query): Searches the support knowledge base

# ── CONSTRAINTS ────────────────────────────────────────────
Never reveal internal pricing tiers or competitor comparisons.
Always call get_account before discussing billing — never guess.
Escalate to human if the user expresses frustration 3+ times.

# ── OUTPUT FORMAT ──────────────────────────────────────────
Respond in plain English. Keep responses under 150 words.
Always end with: "Is there anything else I can help you with?"

Purple = Role  ·  Cyan = Tools  ·  Red = Constraints  ·  Green = Output Format

Tool Schema Design

Tool description quality is the #1 leverage point in agent reliability. The LLM selects tools and constructs arguments based entirely on the description text — not the implementation. A vague or ambiguous description causes wrong tool selection, incorrect arguments, or the agent inventing a response instead of calling the tool.

Anatomy of a Well-Defined Tool Schema (JSON)

{
  "name": "search_product_catalog",
  "description": "Search the product catalog by keyword or SKU. Use this tool when the user asks about product availability, pricing, or specifications. Do NOT use for order status — use get_order instead.",
  "parameters": {
    "type": "object",
    "properties": {
      "query": {"type": "string", "description": "Keyword or SKU to search for"},
      "max_results": {"type": "integer", "default": 5, "description": "Max results to return (1–20)"}
    },
    "required": ["query"]
  }
}
Key elements: (1) Clear one-sentence description of what it does, (2) explicit "when to use vs. similar tools", (3) typed parameters with descriptions, (4) required vs. optional parameters marked. All four are necessary.

Tool Integration Reliability Patterns

Retry with Backoff
Handle Transient Failures

Wrap every tool call in retry logic — 3 attempts with exponential backoff (1s, 2s, 4s). Return a descriptive error message to the LLM if all retries fail so it can adjust its plan.

Output Validation
Type-Safe Tool Results

Validate tool return values against a Pydantic schema before passing to the LLM. Reject malformed responses early — never let corrupted tool output enter the LLM's reasoning context.

Fallback Tools
Graceful Degradation

Define fallback tools for each primary tool (e.g., search_db_fastsearch_db_full). If the primary tool fails or times out, the agent can try the fallback without breaking the loop.

Parallel Tool Calling
Speed Up Independent Calls

When multiple tools don't depend on each other's results, call them in parallel. Modern LLMs can emit multiple tool call JSON objects in one response — execute them concurrently and merge results.

Structured Output

Structured output constrains the LLM to produce a response that matches a predefined Pydantic model or JSON schema — not free text. With strict mode (supported by OpenAI, NVIDIA NIM endpoints, and most modern APIs), the response is guaranteed to match the schema. This eliminates brittle string parsing and makes agent pipelines dramatically more reliable in production. Example: instead of parsing "The sentiment is positive with 0.87 confidence", the model returns {"sentiment": "positive", "confidence": 0.87}.

Multimodal Agents & NVIDIA NIM

👁️
Vision Agent
Image Understanding

Agent accepts image inputs via tool or direct upload. Calls a vision LLM (GPT-4o, LLaVA, NVIDIA VILA) to inspect, caption, or compare images. Used for document analysis, defect detection, UI testing.

💻
Code Execution Agent
Sandboxed Interpreter

Agent writes code and executes it in an isolated sandbox (E2B, Jupyter kernel). Result is fed back as Observation. Enables self-correcting code: run → see error → rewrite → run again.

🎙️
Audio / Speech Agent
Voice-Enabled Workflows

Agent transcribes audio input (Whisper), reasons over the text, and optionally synthesizes a spoken response (TTS). NVIDIA Riva provides on-prem ASR/TTS for latency-sensitive pipelines.

NVIDIA NIM
Model Microservices

NIM packages optimized models (LLMs, vision, embedding, reranking) as OpenAI-compatible API endpoints. Agents call NIM as a tool — same interface regardless of model type. GPU-optimized with TensorRT-LLM.

Concept Comparison Table
Filter by pillar or view all 22 key NCP-AAI Agent Development concepts.
ConceptPillarKey DetailExam Tip
LangChainFrameworksComposable building blocks: chains, tools, memory, retrievers; LCEL pipe syntax for composing runnablesFoundation that LangGraph is built on; great for retrieval pipelines and simple agents
LangGraph StateGraphFrameworksGraph-based agent execution; nodes are Python functions; shared State dict; conditional edges for routingPreferred for complex agents with loops, branching, and human-in-the-loop requirements
LangGraph CheckpointerFrameworksPersists State between turns using SQLite, Redis, or Postgres; enables session memory and HITL interruptsRequired for multi-turn agentic conversations; MemorySaver is the in-memory option
AutoGen ConversableAgentFrameworksAgents that can send/receive messages, execute code, and call tools; GroupChat manages multi-agent turn-takingBest for dynamic multi-agent conversation; built-in code execution with Docker sandbox
CrewAI CrewFrameworksCollection of role-based agents with a process (sequential or hierarchical); each agent has role, goal, backstoryMost opinionated/beginner-friendly; hierarchical process adds an orchestrator automatically
LCEL (LangChain Expression Language)FrameworksPipe syntax: prompt | llm | parser; supports streaming, async, batch, and parallel execution nativelyReplaces legacy chain classes; enables first-class streaming for agent responses
Agent System PromptPrompt Eng.Four sections: Role, Tools, Constraints, Output Format — each serves a distinct purpose for agent behaviorOmitting constraints leads to policy violations; omitting output format leads to inconsistent responses
Tool Description QualityPrompt Eng.The LLM selects tools and arguments based entirely on description text; vague descriptions → wrong tool calls#1 reliability lever; include what the tool does, when to use it vs. similar tools, and return value shape
Few-Shot Tool ExamplesPrompt Eng.Showing 2–3 example tool call + observation pairs in the system prompt dramatically improves argument formattingEspecially important for tools with complex or nested JSON argument structures
Structured Output (Pydantic)Prompt Eng.Constrains LLM response to a Pydantic model or JSON schema; strict mode guarantees schema complianceEliminates brittle string parsing; use model.with_structured_output(MyModel) in LangChain
Output ParserPrompt Eng.Converts LLM free-text output into structured Python objects; PydanticOutputParser, JsonOutputParserAlways add format instructions to the prompt when using output parsers
Prompt TemplatePrompt Eng.Parameterized prompts with variable slots; ChatPromptTemplate.from_messages() for multi-turn agentsSeparates prompt logic from runtime values; enables reuse across different agent instances
Tool Schema (JSON)Tool Integrationname, description, parameters (type, properties, required); used by LLM to select and call toolsEvery parameter must have a type and description; mark optional params with defaults
Retry with BackoffTool IntegrationWrap tool calls in retry logic — 3 attempts, exponential backoff (1s, 2s, 4s); return LLM-readable error on failureHandles transient API failures; error message must be human-readable so agent can adapt
Parallel Tool CallingTool IntegrationLLM emits multiple tool call JSON objects in one response; host executes concurrently and merges results2–4× faster for independent tool calls; not all LLMs support multi-tool in one response
Tool Output ValidationTool IntegrationValidate tool return values against expected schema before injecting as ObservationPrevents corrupted tool output from entering LLM reasoning; Pydantic models are ideal validators
Fallback ToolTool IntegrationSecondary tool invoked when primary tool fails or times out; enables graceful degradationDefine tool.with_fallbacks([backup_tool]) in LangChain for automatic fallback routing
Human-in-the-Loop (HITL)Tool IntegrationAgent pauses at a defined node for human approval before executing a high-risk tool callLangGraph: interrupt_before=["dangerous_tool_node"] with a checkpointer enables HITL
NVIDIA NIMMultimodalOptimized model microservices (LLM, vision, embedding, reranking) as OpenAI-compatible API endpointsDrop-in replacement for OpenAI API; backed by TensorRT-LLM for GPU-optimized inference
Vision AgentMultimodalAgent that accepts image inputs, calls a vision LLM (GPT-4o, NVIDIA VILA), and reasons over visual contentImage passed as base64 or URL in the tool call; useful for document OCR, defect detection
Code Execution AgentMultimodalAgent writes and runs code in a sandboxed environment (E2B, Jupyter); uses execution result as ObservationSelf-correcting loop: run → observe error → rewrite → run again; requires isolated sandbox
NVIDIA RivaMultimodalOn-premises ASR (speech-to-text) and TTS (text-to-speech) for voice-enabled agentic pipelinesLow-latency on-prem alternative to cloud STT/TTS; GPU-optimized; integrates with NIM ecosystem
Real-World Examples
How agent development and framework concepts apply in production agentic AI systems.
Pillar 1 · Agent Frameworks

LangGraph HR Onboarding Agent with Human-in-the-Loop

A stateful HR onboarding agent that creates accounts, provisions access, and sends welcome emails — but requires a human manager to approve system access before provisioning.

  • StateGraph with 5 nodes: collect_info → create_accounts → request_approval → provision_access → send_welcome
  • interrupt_before=["provision_access"] pauses the graph after account creation
  • Manager receives a notification with the pending State; reviews and approves via a UI
  • Graph resumes from the checkpoint; provision_access node runs with manager's approval logged
  • MemorySaver checkpointer persists the graph State across the approval wait (could be hours)
  • Conditional edge routes to an escalation node if approval is rejected
Result: Zero manual coordination needed — the agent handles the full workflow while the human retains control over the sensitive provisioning step.
Pillar 2 · Prompt Engineering for Agents

Fixing a Flaky Agent with Tool Description Rewriting

A production e-commerce agent was incorrectly using search_orders to answer product questions, causing hallucinated order data. Root cause: ambiguous tool descriptions.

  • Before: search_orders description was "Search orders and products in the system"
  • LLM correctly identified the ambiguity — product questions triggered the wrong tool
  • Fix: Split into two tools with explicit "when NOT to use" clauses in each description
  • search_orders: "Use for order status, shipping, returns. Do NOT use for product info."
  • search_catalog: "Use for product details, pricing, availability. Do NOT use for order status."
  • Added 2 few-shot examples of correct tool selection to the system prompt
Result: Wrong tool selection rate dropped from 23% to 0.4% with no model change or fine-tuning — purely through description rewriting and few-shot examples.
Pillar 3 · Tool Integration & Reliability

Parallel Tool Calling for a Financial Dashboard Agent

A portfolio analysis agent that must fetch stock price, news sentiment, and analyst ratings simultaneously before generating a summary — achieving 3× speed improvement.

  • User asks: "Give me a full snapshot of NVDA right now"
  • Agent reasons that all three data sources are independent — no sequential dependency
  • LLM emits 3 tool call JSON objects in a single response: get_price, get_news_sentiment, get_analyst_ratings
  • Host system executes all three concurrently; total wait = max(individual latencies) not sum
  • All three Observations are injected into the next LLM context simultaneously
  • Pydantic models validate each response — malformed data from news API caught and logged
Result: Response time drops from ~4.5s (sequential) to ~1.6s (parallel). Pydantic validation catches a flaky news API response before it corrupts the agent's analysis.
Pillar 4 · Multimodal & Advanced Agents

NVIDIA NIM-Powered Vision + Code Agent for Data Analysis

A data science agent that accepts a chart image, interprets it using NVIDIA's vision NIM, writes Python analysis code, executes it in a sandbox, and returns insights.

  • User uploads a bar chart PNG: "What's the trend here and how significant is it?"
  • Agent calls analyze_image(image_url) → NVIDIA VILA NIM returns a structured description of the chart
  • Agent writes Python code to perform a regression analysis on the extracted data points
  • Code submitted to E2B sandbox → execution result returned as Observation
  • Agent calls generate_report(findings) → NVIDIA LLM NIM structures the final markdown report
  • All three NIM endpoints use the same OpenAI-compatible API interface — zero code changes to swap models
Result: A full chart-to-insight pipeline in under 8 seconds. NVIDIA NIM's OpenAI-compatible interface means the agent code is framework-agnostic — the same agent works with any NIM-hosted model.
Practice Quiz
10 questions across all four pillars. Select an answer to see the explanation, then advance.
Framework Advisor
Answer a few questions to get a targeted framework or development recommendation.
Memory Hooks — Flip Cards
Click each card to flip and reveal the answer. Ideal for last-minute exam review.
Agent Frameworks

LangChain vs LangGraph

When do you reach for LangGraph?

Use LangChain for linear chains, retrieval pipelines, and simple agents. Use LangGraph when you need: stateful loops (agent re-tries), conditional branching, human-in-the-loop interrupts, or persistent State across sessions. LangGraph is built on LangChain — it extends, not replaces.

Agent Frameworks

AutoGen vs CrewAI

Key difference in metaphor and use case?

AutoGen: conversational metaphor — agents exchange messages. Best for dynamic, code-generating, or debate-based multi-agent patterns. More flexible.
CrewAI: role metaphor — agents have role/goal/backstory. Tasks assigned to a Crew with sequential or hierarchical process. More opinionated, faster to get started.

Prompt Engineering

The 4-Section System Prompt

What are the four essential sections?

1. Role — who the agent is and what it does
2. Tools — list of available tools with brief descriptions
3. Constraints — what the agent must never do
4. Output Format — response structure and length

Missing any section causes predictable failure modes in production.

Prompt Engineering

Tool Description Quality

What must a good tool description include?

A good tool description must include: (1) what it does in one clear sentence, (2) when to use it vs. similar tools, (3) what each parameter means with type and constraints, (4) what the return value looks like. This is the single highest-leverage reliability improvement — no model change required.

Tool Integration

Parallel Tool Calling

How does it work and when should you use it?

The LLM emits multiple tool call JSON objects in a single response. The host system executes all of them concurrently, then injects all Observations at once. Use it when tool calls are independent (no result depends on another). Speed-up = max(latencies) instead of sum(latencies).

Tool Integration

Structured Output & Strict Mode

How does it differ from an output parser?

Output parser: post-processes free-text LLM output — brittle, can fail if format deviates.
Structured output (strict mode): the LLM is constrained to produce only valid schema-conforming JSON — guaranteed by the API. No parsing failures. Use model.with_structured_output(MyPydanticModel) in LangChain.

Multimodal & Advanced

NVIDIA NIM in Agent Pipelines

What makes NIM agent-friendly?

NIM packages optimized models (LLM, vision, embedding, reranking) as OpenAI-compatible API endpoints. Agents call NIM as a tool — same client.chat.completions.create() interface regardless of model type. Backed by TensorRT-LLM for GPU-optimized throughput. Zero code changes to swap from cloud to on-prem.

Multimodal & Advanced

Code Execution Agent Loop

What makes it self-correcting?

The agent writes code → submits to a sandboxed executor (E2B, Jupyter kernel) → receives stdout/stderr as Observation → if an error occurs, it reads the traceback, rewrites the code, and retries. The execution result (not the code) is what drives the next reasoning step. Requires an isolated sandbox to prevent code injection attacks.

NCP-AAI Exam Series · 5 Topics

Ready to Master the Full NCP-AAI Exam?

Continue with Knowledge Integration & RAG for Agents, NVIDIA Platform for Agentic AI, and Evaluation & Safety — all five topics with quizzes and decision tools.