Master the design patterns behind production LLM agents — ReAct loops, reasoning and planning strategies, memory systems, tool calling, and multi-agent coordination for the NCP-AAI exam.
Start Free Practice →AI agents perceive inputs, reason about them, and take actions toward a goal — unlike static LLMs that respond only once. The ReAct loop (Thought→Action→Observation) is the dominant production pattern. Tool calling lets LLMs invoke typed functions via structured JSON output.
Chain-of-Thought generates a single linear reasoning path. Tree-of-Thought explores multiple branches and backtracks. MCTS adds probabilistic rollouts for deeper planning. Task decomposition breaks complex goals into subtask graphs that agents can execute in parallel.
Agents need more than a prompt window. Episodic memory stores past interaction histories. Semantic memory retrieves long-term facts from vector databases. Procedural memory encodes skills as callable tools. In-context memory is the fast but limited working memory of the LLM itself.
Complex tasks benefit from multiple specialized agents. An orchestrator decomposes tasks and assigns them to worker agents. Peer networks communicate directly. Debate patterns pit agents against each other to surface better answers. Reflection agents critique and revise outputs.
Loop repeats until the agent reaches a stopping condition — a final answer or a maximum step limit. ReAct = Reason + Act.
Triggers with "Let's think step by step." Generates a single sequential reasoning chain before the final answer. Fast and reliable for well-defined problems.
✔ Math, structured Q&AExplores multiple reasoning branches simultaneously. Evaluates intermediate steps. Backtracks from dead ends using BFS or DFS. Much stronger on open-ended planning tasks.
✔ Creative, multi-step planningInterleaves reasoning and acting. Each Observation from a real tool grounds the next Thought. Prevents hallucination by anchoring reasoning in live tool results.
✔ Any tool-using agentSimulates many rollouts from each decision node, estimates value, and selects the highest-value branch. Optimal for game-like scenarios with many future steps.
✔ Long-horizon planning{"name": "search_web", "arguments": {"query": "NVIDIA NCP-AAI exam date"}}. The host system executes the function and returns the result as an Observation. This enables agents to interact with APIs, databases, code interpreters, and external services in a controlled, type-safe way — without hallucinating API responses.One orchestrator agent decomposes the task and dispatches subtasks to specialized worker agents. Workers report results back; orchestrator aggregates.
Agents communicate directly with each other without a central coordinator. Each agent can request help from any peer. More resilient but harder to control.
Two or more agents argue opposing positions, or one agent critiques another's output. The process surfaces errors and improves final answer quality.
An agent reviews and scores its own output against a rubric, then generates an improved version. Can iterate multiple rounds until quality meets a threshold.
| Concept | Pillar | Key Detail | Exam Tip |
|---|---|---|---|
| AI Agent | Foundations | A system that perceives its environment, reasons about it, and takes actions to achieve a goal — repeatedly, autonomously | Distinguish from a single LLM call: agents loop, use tools, and have memory |
| ReAct Loop | Foundations | Thought → Action → Observation cycle that repeats until a final answer. ReAct = Reason + Act | Most common production agent pattern; grounds reasoning in live tool results |
| Tool Calling | Foundations | LLM outputs structured JSON specifying function name + typed arguments; host executes and returns result | Also called function calling; prevents hallucinating API responses |
| Simple Reflex Agent | Foundations | Condition-action rules only; no internal state or memory; fastest but brittle | Fails in partially observable environments |
| Goal-Based Agent | Foundations | Searches for action sequences that achieve a specified goal; plans ahead; most LLM agents are goal-based | Uses planning (CoT, ToT, MCTS) to find paths to the goal |
| Stopping Condition | Foundations | Rule that terminates the agent loop — final answer reached, max steps exceeded, or confidence threshold met | Always implement; infinite loops are a common production failure mode |
| Chain-of-Thought (CoT) | Reasoning | Generates a single linear reasoning chain before the final answer; triggered by "Let's think step by step" | Simple and reliable; underperforms on tasks requiring exploration or backtracking |
| Tree-of-Thought (ToT) | Reasoning | Explores multiple reasoning branches; evaluates intermediate steps; backtracks from dead ends with BFS/DFS | Significantly outperforms CoT on multi-step planning and creative tasks |
| MCTS | Reasoning | Monte Carlo Tree Search — simulates rollouts from each node, estimates value, selects highest-value branch | Best for long-horizon planning with many possible futures; computationally expensive |
| Task Decomposition | Reasoning | Breaking a complex goal into a DAG (directed acyclic graph) of simpler subtasks that can be parallelized | Enables parallel agent execution; LangGraph uses a DAG to express agent workflows |
| Self-Consistency | Reasoning | Generate N independent CoT reasoning paths; take the majority-vote answer across all paths | Improves reliability at the cost of N × inference time; reduces variance in answers |
| Reflexion | Reasoning | Agent critiques its own output in natural language, stores critique as episodic memory, retries with that context | Does not update model weights; improvement is in-context only |
| In-Context Memory | Memory | The active prompt window; fastest to access; limited by context length; lost between sessions | Working memory of the LLM; fills up with long agentic conversations |
| Episodic Memory | Memory | Records of past interactions and task outcomes; retrieved by similarity to current context | Enables learning from past errors; often stored in a key-value store with embeddings |
| Semantic Memory | Memory | Long-term factual knowledge in a vector database; retrieved via embedding similarity or BM25 | This is the RAG knowledge base; grounded recall vs hallucinated knowledge |
| Procedural Memory | Memory | Skills and tools the agent can invoke; encoded as function schemas, system prompts, or retrieval-augmented tools | Tells agent what it can do; separate from what it knows (semantic memory) |
| Memory Consolidation | Memory | Summarizing and compressing older in-context memory into external storage before the context window fills | Prevents context overflow; common in long-running agentic tasks |
| Orchestrator Agent | Multi-Agent | High-level agent that decomposes tasks and delegates subtasks to worker agents; aggregates results | LangGraph "supervisor" node; AutoGen GroupChatManager |
| Worker Agent | Multi-Agent | Specialized subagent that executes a specific task type (search, code, critique, summarize) | Each worker has a restricted tool set and system prompt focused on its specialty |
| Debate Pattern | Multi-Agent | Two+ agents argue opposing positions; process surfaces errors; final answer emerges from structured disagreement | Improves factual accuracy; used in Constitutional AI and adversarial prompting |
| Reflection Pattern | Multi-Agent | Agent (or separate critic agent) reviews output against a rubric, scores it, and produces an improved revision | Can be self-reflection (same agent) or cross-reflection (separate critic agent) |
| Shared State / Blackboard | Multi-Agent | Common data structure all agents can read/write to share intermediate results and context | LangGraph State object; prevents agents from duplicating work or contradicting each other |
A production support agent for a SaaS company that handles billing inquiries, ticket lookups, and policy questions using a ReAct loop over three tools.
get_account(user_id="u_8821") → Observation: Plan upgraded to Team on Mar 15get_invoice(invoice_id="inv_334") → Observation: $149 charge on Mar 16, prorated Team planA competitive intelligence agent that generates a structured market analysis report by exploring multiple research angles in parallel before committing to a structure.
A developer assistant agent that works on a multi-week software project, demonstrating all four memory types in a single system.
A financial analysis system where an orchestrator decomposes a complex query and dispatches to specialized worker agents in parallel.
ReAct = Reason + Act
The three steps: Thought (reason about what to do) → Action (invoke a tool) → Observation (process the tool result). The loop repeats until the agent produces a final answer or hits a stopping condition.
Tool calling forces the LLM to output structured JSON (function name + typed arguments) rather than inventing a response. The host system executes the real function and returns the actual result as an Observation — grounding every subsequent answer in real data, not LLM imagination.
CoT = single linear path, fast, good for well-defined problems. Trigger: "Let's think step by step."
ToT = multiple branches, evaluates intermediate steps, backtracks from dead ends. Use when tasks require exploration, creativity, or multi-step planning where the path isn't obvious upfront.
Generate N independent CoT reasoning paths, then take the majority-vote answer across all N outputs. Reduces variance and improves accuracy at the cost of N × inference time. Works because diverse paths to the same answer increase confidence that the answer is correct.
In-context: Active prompt window — fast, limited
Episodic: Past interaction records — experience
Semantic: Facts in a vector DB — knowledge
Procedural: Tool schemas and skills — capabilities
Think: What I'm doing · What I've done · What I know · What I can do
As an agent converses, the in-context window fills up. Memory consolidation summarizes older context into external (episodic or semantic) storage before the window overflows, then injects only the relevant summary. Prevents context overflow without losing the agent's history entirely.
Orchestrator: Decomposes the high-level goal into subtasks, assigns them to workers, aggregates results. One per team. (LangGraph supervisor, AutoGen GroupChatManager)
Worker: Executes a specific subtask with a restricted tool set. Specialized. Runs in parallel with other workers.
Debate: Two+ agents argue opposing positions — adversarial, surfaces factual errors through disagreement. Used in Constitutional AI.
Reflection: One agent (or a critic agent) reviews its own output against a rubric and iterates. Cooperative self-improvement. Used in the Reflexion framework.