Master LangChain, LangGraph, AutoGen, and CrewAI — plus prompt engineering for agents, tool schema design, structured output, reliability patterns, and multimodal agent development.
Start Free Practice →The four dominant open-source frameworks each take a different approach. LangChain provides composable building blocks. LangGraph adds stateful graph execution with loops and branching. AutoGen enables conversational multi-agent patterns. CrewAI offers a role-based, opinionated structure.
Agent system prompts define role, available tools, constraints, and output format. Tool description quality is the single highest-leverage variable in agent reliability — the LLM selects tools based entirely on their description text. Structured output enforces type-safe, parseable responses.
Production agents fail when tools fail. Robust tool integration requires well-typed schemas, retry logic with exponential backoff, output validation before passing results to the LLM, and parallel tool calling when multiple tools are independent. Error messages must be LLM-readable.
Modern agents go beyond text — they inspect images, execute code in sandboxes, process audio, and call specialized model microservices. NVIDIA NIM packages optimized models as API endpoints agents can call as tools, dramatically simplifying multimodal agent deployment on NVIDIA hardware.
State object — a TypedDict that all nodes read from and write to.StateGraph defines the graph · add_node() registers functions · add_conditional_edges() routes based on State · MemorySaver checkpointer enables human-in-the-loop interrupts and session persistence.Purple = Role · Cyan = Tools · Red = Constraints · Green = Output Format
Wrap every tool call in retry logic — 3 attempts with exponential backoff (1s, 2s, 4s). Return a descriptive error message to the LLM if all retries fail so it can adjust its plan.
Validate tool return values against a Pydantic schema before passing to the LLM. Reject malformed responses early — never let corrupted tool output enter the LLM's reasoning context.
Define fallback tools for each primary tool (e.g., search_db_fast → search_db_full). If the primary tool fails or times out, the agent can try the fallback without breaking the loop.
When multiple tools don't depend on each other's results, call them in parallel. Modern LLMs can emit multiple tool call JSON objects in one response — execute them concurrently and merge results.
{"sentiment": "positive", "confidence": 0.87}.Agent accepts image inputs via tool or direct upload. Calls a vision LLM (GPT-4o, LLaVA, NVIDIA VILA) to inspect, caption, or compare images. Used for document analysis, defect detection, UI testing.
Agent writes code and executes it in an isolated sandbox (E2B, Jupyter kernel). Result is fed back as Observation. Enables self-correcting code: run → see error → rewrite → run again.
Agent transcribes audio input (Whisper), reasons over the text, and optionally synthesizes a spoken response (TTS). NVIDIA Riva provides on-prem ASR/TTS for latency-sensitive pipelines.
NIM packages optimized models (LLMs, vision, embedding, reranking) as OpenAI-compatible API endpoints. Agents call NIM as a tool — same interface regardless of model type. GPU-optimized with TensorRT-LLM.
| Concept | Pillar | Key Detail | Exam Tip |
|---|---|---|---|
| LangChain | Frameworks | Composable building blocks: chains, tools, memory, retrievers; LCEL pipe syntax for composing runnables | Foundation that LangGraph is built on; great for retrieval pipelines and simple agents |
| LangGraph StateGraph | Frameworks | Graph-based agent execution; nodes are Python functions; shared State dict; conditional edges for routing | Preferred for complex agents with loops, branching, and human-in-the-loop requirements |
| LangGraph Checkpointer | Frameworks | Persists State between turns using SQLite, Redis, or Postgres; enables session memory and HITL interrupts | Required for multi-turn agentic conversations; MemorySaver is the in-memory option |
| AutoGen ConversableAgent | Frameworks | Agents that can send/receive messages, execute code, and call tools; GroupChat manages multi-agent turn-taking | Best for dynamic multi-agent conversation; built-in code execution with Docker sandbox |
| CrewAI Crew | Frameworks | Collection of role-based agents with a process (sequential or hierarchical); each agent has role, goal, backstory | Most opinionated/beginner-friendly; hierarchical process adds an orchestrator automatically |
| LCEL (LangChain Expression Language) | Frameworks | Pipe syntax: prompt | llm | parser; supports streaming, async, batch, and parallel execution natively | Replaces legacy chain classes; enables first-class streaming for agent responses |
| Agent System Prompt | Prompt Eng. | Four sections: Role, Tools, Constraints, Output Format — each serves a distinct purpose for agent behavior | Omitting constraints leads to policy violations; omitting output format leads to inconsistent responses |
| Tool Description Quality | Prompt Eng. | The LLM selects tools and arguments based entirely on description text; vague descriptions → wrong tool calls | #1 reliability lever; include what the tool does, when to use it vs. similar tools, and return value shape |
| Few-Shot Tool Examples | Prompt Eng. | Showing 2–3 example tool call + observation pairs in the system prompt dramatically improves argument formatting | Especially important for tools with complex or nested JSON argument structures |
| Structured Output (Pydantic) | Prompt Eng. | Constrains LLM response to a Pydantic model or JSON schema; strict mode guarantees schema compliance | Eliminates brittle string parsing; use model.with_structured_output(MyModel) in LangChain |
| Output Parser | Prompt Eng. | Converts LLM free-text output into structured Python objects; PydanticOutputParser, JsonOutputParser | Always add format instructions to the prompt when using output parsers |
| Prompt Template | Prompt Eng. | Parameterized prompts with variable slots; ChatPromptTemplate.from_messages() for multi-turn agents | Separates prompt logic from runtime values; enables reuse across different agent instances |
| Tool Schema (JSON) | Tool Integration | name, description, parameters (type, properties, required); used by LLM to select and call tools | Every parameter must have a type and description; mark optional params with defaults |
| Retry with Backoff | Tool Integration | Wrap tool calls in retry logic — 3 attempts, exponential backoff (1s, 2s, 4s); return LLM-readable error on failure | Handles transient API failures; error message must be human-readable so agent can adapt |
| Parallel Tool Calling | Tool Integration | LLM emits multiple tool call JSON objects in one response; host executes concurrently and merges results | 2–4× faster for independent tool calls; not all LLMs support multi-tool in one response |
| Tool Output Validation | Tool Integration | Validate tool return values against expected schema before injecting as Observation | Prevents corrupted tool output from entering LLM reasoning; Pydantic models are ideal validators |
| Fallback Tool | Tool Integration | Secondary tool invoked when primary tool fails or times out; enables graceful degradation | Define tool.with_fallbacks([backup_tool]) in LangChain for automatic fallback routing |
| Human-in-the-Loop (HITL) | Tool Integration | Agent pauses at a defined node for human approval before executing a high-risk tool call | LangGraph: interrupt_before=["dangerous_tool_node"] with a checkpointer enables HITL |
| NVIDIA NIM | Multimodal | Optimized model microservices (LLM, vision, embedding, reranking) as OpenAI-compatible API endpoints | Drop-in replacement for OpenAI API; backed by TensorRT-LLM for GPU-optimized inference |
| Vision Agent | Multimodal | Agent that accepts image inputs, calls a vision LLM (GPT-4o, NVIDIA VILA), and reasons over visual content | Image passed as base64 or URL in the tool call; useful for document OCR, defect detection |
| Code Execution Agent | Multimodal | Agent writes and runs code in a sandboxed environment (E2B, Jupyter); uses execution result as Observation | Self-correcting loop: run → observe error → rewrite → run again; requires isolated sandbox |
| NVIDIA Riva | Multimodal | On-premises ASR (speech-to-text) and TTS (text-to-speech) for voice-enabled agentic pipelines | Low-latency on-prem alternative to cloud STT/TTS; GPU-optimized; integrates with NIM ecosystem |
A stateful HR onboarding agent that creates accounts, provisions access, and sends welcome emails — but requires a human manager to approve system access before provisioning.
interrupt_before=["provision_access"] pauses the graph after account creationA production e-commerce agent was incorrectly using search_orders to answer product questions, causing hallucinated order data. Root cause: ambiguous tool descriptions.
search_orders description was "Search orders and products in the system"search_orders: "Use for order status, shipping, returns. Do NOT use for product info."search_catalog: "Use for product details, pricing, availability. Do NOT use for order status."A portfolio analysis agent that must fetch stock price, news sentiment, and analyst ratings simultaneously before generating a summary — achieving 3× speed improvement.
get_price, get_news_sentiment, get_analyst_ratingsA data science agent that accepts a chart image, interprets it using NVIDIA's vision NIM, writes Python analysis code, executes it in a sandbox, and returns insights.
analyze_image(image_url) → NVIDIA VILA NIM returns a structured description of the chartgenerate_report(findings) → NVIDIA LLM NIM structures the final markdown reportUse LangChain for linear chains, retrieval pipelines, and simple agents. Use LangGraph when you need: stateful loops (agent re-tries), conditional branching, human-in-the-loop interrupts, or persistent State across sessions. LangGraph is built on LangChain — it extends, not replaces.
AutoGen: conversational metaphor — agents exchange messages. Best for dynamic, code-generating, or debate-based multi-agent patterns. More flexible.
CrewAI: role metaphor — agents have role/goal/backstory. Tasks assigned to a Crew with sequential or hierarchical process. More opinionated, faster to get started.
1. Role — who the agent is and what it does
2. Tools — list of available tools with brief descriptions
3. Constraints — what the agent must never do
4. Output Format — response structure and length
Missing any section causes predictable failure modes in production.
A good tool description must include: (1) what it does in one clear sentence, (2) when to use it vs. similar tools, (3) what each parameter means with type and constraints, (4) what the return value looks like. This is the single highest-leverage reliability improvement — no model change required.
The LLM emits multiple tool call JSON objects in a single response. The host system executes all of them concurrently, then injects all Observations at once. Use it when tool calls are independent (no result depends on another). Speed-up = max(latencies) instead of sum(latencies).
Output parser: post-processes free-text LLM output — brittle, can fail if format deviates.
Structured output (strict mode): the LLM is constrained to produce only valid schema-conforming JSON — guaranteed by the API. No parsing failures. Use model.with_structured_output(MyPydanticModel) in LangChain.
NIM packages optimized models (LLM, vision, embedding, reranking) as OpenAI-compatible API endpoints. Agents call NIM as a tool — same client.chat.completions.create() interface regardless of model type. Backed by TensorRT-LLM for GPU-optimized throughput. Zero code changes to swap from cloud to on-prem.
The agent writes code → submits to a sandboxed executor (E2B, Jupyter kernel) → receives stdout/stderr as Observation → if an error occurs, it reads the traceback, rewrites the code, and retries. The execution result (not the code) is what drives the next reasoning step. Requires an isolated sandbox to prevent code injection attacks.