Unit 7: Retrieval, Tools, and Agents
Lesson at a glance
| Item | Detail | | --------------------- | ---------------------------------------------------------------------------------- | | Suggested length | 3 × 60 minutes | | Recommended placement | Week 7 of AI Fluency | | Prerequisite | Units 1–6 | | Materials | Frontier-LLM access (one with browse/tools enabled if possible), small text corpus |
Safety: Agents that take actions (browse, send email, run code) can cause real-world side effects. The labs in this unit are observation-only. Students will use tool-enabled models but will not deploy autonomous agents that touch external systems.
Standards & credential alignment
- AI4K12 Big Ideas: Representation & Reasoning, Natural Interaction, Societal Impact.
- CSTA K-12: 3A-AP-13, 3A-AP-22, 3A-IC-26.
- NIST AI RMF: Map, Measure, Manage.
Learning objectives
By the end of this unit, students can:
- Define RAG and explain the four-step pipeline (chunk → embed → store → retrieve → generate).
- Distinguish vector search (similarity) from keyword search (exact match).
- Define function calling and MCP (Model Context Protocol) at the conceptual level.
- Describe what an agent is, how a planning loop works, and three failure modes.
- Use a frontier model with browsing and code-execution tools enabled, and inspect the trace.
- Explain when not to use an agent and how human-in-the-loop helps.
Vocabulary
- RAG (Retrieval-Augmented Generation) - Pipeline that fetches relevant text from a knowledge base and stuffs it into the model's prompt before generating an answer.
- Embedding model - A separate small model that turns a chunk of text into a vector.
- Vector store / vector DB - A database optimized for "find me chunks whose vectors are close to this one."
- Chunking - Splitting documents into 200–800 token pieces so they can be embedded individually.
- Hybrid search - Combining vector (semantic) and keyword (lexical) search for better recall.
- Function calling - A protocol where the model emits a structured "I want to call function X with these args," and the host runs the function and returns the result.
- MCP (Model Context Protocol) - An open protocol for exposing tools, data sources, and prompts to LLMs. Makes tool integrations interoperable across models.
- Tool use - Umbrella term for any time the model causes external code to run.
- Agent - An LLM in a planning loop that picks tools, observes results, and decides next steps until done.
- Human-in-the-loop (HITL) - Workflow where the agent pauses for human approval before high-impact actions.
Pacing - Day 1 (60 min): RAG, top to bottom
| Time | Segment | Notes | | ----------- | ------------------------------------------ | -------------------------------------- | | 0:00 – 0:25 | Mini-lesson - the RAG pipeline | Whiteboard the four steps. | | 0:25 – 0:50 | Activity - manual RAG | Class plays the role of the retriever. | | 0:50 – 1:00 | Discussion - when RAG helps, when it hurts | |
Day 1 - Mini-lesson: the RAG pipeline (25 min)
Whiteboard:
┌──────────────┐ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ ┌──────────┐
│ Documents │ → │ Chunk & │ → │ Store │ → │ Retrieve │ → │ Generate │
│ (your data) │ │ Embed │ │ (vector DB) │ │ on each Q │ │ with LLM │
└──────────────┘ └──────────────┘ └──────────────┘ └──────────────┘ └──────────┘
↑ ↑
"embedding model" "user question"
Plain English:
- Chunk: split your documents into ~500-token pieces.
- Embed: turn each chunk into a vector with the embedding model.
- Store: save each chunk + vector in a vector DB.
- Retrieve: at query time, embed the user's question and find the top-K closest chunks.
- Generate: stuff those chunks into the prompt as context, then ask the LLM.
The whole point: the LLM is now answering from your data, with citations, instead of from its training-time memory. This is the single most powerful pattern in applied AI right now. ChatGPT's "Custom GPTs," NotebookLM, and most "talk to your PDF" apps are RAG underneath.
Day 1 - Activity: manual RAG (25 min)
The class becomes the retriever. The teacher distributes 12 paragraph-length "chunks" (e.g., from a textbook chapter) to 12 students. Each student is a "vector." The teacher reads a question, the class debates which 3 chunks are most relevant. Those 3 chunks are read aloud. The teacher then asks the LLM the question while pasting only those 3 chunks as context. The class observes the difference between the model's free-recall answer and the RAG-augmented answer.
This is the most concrete way to teach RAG ever invented and it lands in 25 minutes.
Pacing - Day 2 (60 min): Function calling, MCP, tools
| Time | Segment | Notes | | ----------- | -------------------------------------- | ------------------------------------------------------- | | 0:00 – 0:25 | Mini-lesson - function calling and MCP | What it is, why it matters. | | 0:25 – 0:50 | Activity - observe the trace | Use a tool-enabled model; inspect what tools it called. | | 0:50 – 1:00 | Discussion | |
Day 2 - Mini-lesson: function calling and MCP (25 min)
The model can't do math reliably. The model can't browse. The model can't read your local files. So why does ChatGPT seem to do all those things?
Function calling. The model outputs structured JSON like:
{ "function": "browse", "arguments": { "url": "https://example.com" } }
The host application sees that, runs the actual browse, and feeds the result back. The model never browsed - it requested a browse.
This pattern is universal now. The model can reliably describe what it wants. The host runs the side-effects. The model integrates the result. Repeat.
MCP (Model Context Protocol) is the open standard that makes this interoperable: a tool exposed via MCP works in Claude, ChatGPT, Cursor, VS Code, and any other MCP-aware host. You write the tool once, every model can use it. Compare to the old world where every integration was bespoke.
Day 2 - Activity: observe the trace (25 min)
Use a frontier chat app with tools enabled. Ask it a question that requires tools, like:
"What's the current price of Bitcoin in USD, converted to the current price of a barrel of crude oil?"
The student observes (in the chat UI's "thinking" / "tool calls" view) the model:
- Calls a search or browse tool for Bitcoin price.
- Reads the result.
- Calls again for crude oil.
- Reads the result.
- Does the math (sometimes via a code-exec tool, sometimes inline).
- Composes the final answer.
That trace is an agent in miniature. The model is reasoning, acting, observing, repeating.
Pacing - Day 3 (60 min): Agents and the off-the-rails problem
| Time | Segment | Notes | | ----------- | ----------------------------------- | ---------------------------------------- | | 0:00 – 0:25 | Mini-lesson - the agent loop | Plan → act → observe → reflect → repeat. | | 0:25 – 0:40 | Mini-lesson - three failure modes | Loops, off-task, irreversible actions. | | 0:40 – 0:55 | Activity - design an agent on paper | Pairs design a use case + guardrails. | | 0:55 – 1:00 | Quiz / exit ticket | |
Day 3 - The agent loop (25 min)
┌──────────┐
│ Goal │
└────┬─────┘
▼
┌──────────┐ no, refine plan
│ Plan │◀────────────┐
└────┬─────┘ │
▼ │
┌──────────┐ │
│ Act │ │
└────┬─────┘ │
▼ │
┌──────────┐ │
│ Observe │ │
└────┬─────┘ │
▼ │
┌──────────┐ │
│ Reflect │──── done? ───┘
└────┬─────┘
▼
┌────────┐
│ Output │
└────────┘
Concrete agent examples a high-schooler should know:
- Cursor / Claude Code / Copilot agent mode - coding agents that read, plan, edit, test, repeat.
- Browser-use agents - read a webpage, click buttons, fill forms.
- Research agents - multi-source web research with summary report.
- Voice agents - phone-call agents (used in customer support, increasingly).
Day 3 - Three failure modes (15 min)
- Loops. Agent re-tries the same failing action 14 times because nothing in its plan tells it to escalate.
- Off-task drift. Agent decides mid-run to "improve" something not in the goal. Student goes for "fix the bug," agent rewrites the whole architecture.
- Irreversible actions. Agent runs
rm -rf, sends an email, charges a credit card. Once it's done, it's done. This is why production agents have human approval gates for high-impact actions.
The teaching line: "An agent is an intern with admin access. You wouldn't give a real intern the company credit card on day one."
Day 3 - Activity: design an agent on paper (15 min)
Each pair designs a safe agent for a school context - e.g., "study buddy that drills me on flashcards I'm getting wrong." They specify:
- Goal (one sentence).
- Tools the agent has (and explicitly does NOT have).
- Stopping conditions.
- Human-in-the-loop checkpoints.
Submit on the worksheet. Discuss best designs.
Differentiation, IEP, and 504 supports
- Concrete-thinking students: the "manual RAG" activity makes an abstract topic physical. Lean into it.
- Students who lose patience with multi-step systems: skip MCP detail; focus on the four-step RAG pipeline and one agent example. The conceptual scaffolding is enough.
Assessment & evidence
- Formative: manual-RAG observations, agent design worksheet.
- Summative: quiz (12 questions). One-paragraph "where I'd use RAG, where I'd use an agent, where I'd use neither" reflection.
What's next
Unit 8 turns to the senses: image generation, voice cloning, video, deepfakes, and the cryptography that fights back (C2PA, watermarking). Multimodal AI for high schoolers - including the parts that scare parents.
