LESSON
Day 326: Advanced Agent Patterns - Planning, Memory, and Multi-Agent Systems
The core idea: advanced agent systems become reliable only when planning, memory, and delegation are treated as explicit runtime components with contracts, budgets, and observability.
Today's "Aha!" Moment
The insight: 21/05.md defined an agent as a bounded control loop over state, actions, and stop conditions. This lesson adds the next layer: once tasks get longer, context spans multiple sessions, or responsibilities need to be split, the "one loop, one prompt, one tool list" design stops scaling cleanly.
Why this matters: Teams often reach for planning, memory, or multi-agent frameworks as if they were feature upgrades. In production they are architecture choices. Each one changes where control lives, how errors accumulate, and what has to be logged, tested, and governed.
Concrete anchor: Consider an enterprise IT operations assistant handling "My laptop was stolen at the airport, disable access and get me a replacement." A useful system may need to:
- verify the employee and gather missing details
- revoke sessions and disable the device in an endpoint-management tool
- check procurement policy and replacement eligibility
- create or update tickets across security and IT service systems
- resume the case the next day without losing the important facts
That is no longer just a single agent step repeated a few times. It needs decomposition, durable state, and sometimes specialist roles.
Keep this mental hook in view: Advanced agent patterns work when reasoning artifacts, memory records, and handoffs are explicit system objects instead of hidden prompt behavior.
Why This Matters
The single-agent loop from 21/05.md is enough for short tasks: search, read, maybe call one or two tools, then stop. It starts to fail when the job has one or more of these properties:
- the task requires ordering constraints such as "verify identity before revoke access"
- useful information should persist across sessions or across related tasks
- different tools, permissions, or expertise should be isolated behind different roles
Without advanced patterns:
- the model improvises a plan inside hidden reasoning, so failures are hard to inspect
- every session starts cold, forcing the user to repeat information or the system to re-derive it
- one large prompt accumulates too many tools, policies, and instructions, which increases confusion and tool misuse
With advanced patterns:
- plans become inspectable artifacts that can be reviewed, budgeted, and resumed
- memory becomes a selective data product with provenance, expiration, and retrieval rules
- multi-agent designs become an orchestration problem with role boundaries and handoff contracts
Real-world impact: Better task completion on long-running work, fewer repeated tool calls, safer permission boundaries, and clearer debugging when a complex task goes off course.
This lesson prepares for 21/07.md, where those extra moving parts force stronger safety controls, monitoring, and observability.
Learning Objectives
By the end of this session, you should be able to:
- Design explicit planning layers for agent systems so task decomposition is inspectable instead of hidden inside free-form model output.
- Choose the right memory strategy for an agent by separating working state from durable memory and reasoning about staleness, privacy, and retrieval quality.
- Decide when multi-agent coordination is worth the complexity and define role, tool, and handoff boundaries that make it operable in production.
Core Concepts Explained
Concept 1: Planning Externalizes Control Flow
For example, The stolen-laptop assistant cannot safely jump straight to "disable device and order replacement." It has to verify the user's identity, confirm whether the device is corporate-managed, notify security, and only then trigger replacement procurement. If the model improvises that sequence from scratch on every turn, the system becomes hard to predict and harder to audit.
At a high level, Planning is about moving part of the control logic out of hidden token-by-token reasoning and into a visible intermediate artifact. A good plan does not just list steps. It captures dependencies, success criteria, and when to re-plan.
Mechanically: In production, explicit planning often looks like:
- Task intake
- classify the request
- estimate risk and expected horizon
- decide whether the task even needs a plan
- Plan generation
- produce a structured plan with steps, required tools, and completion checks
- attach budgets such as
max_steps,max_cost, or required approvals
- Execution
- run one step at a time through the same validated loop from
21/05.md - record outputs and mark steps complete, blocked, or failed
- run one step at a time through the same validated loop from
- Re-planning
- revise the remaining plan when a step fails, new evidence arrives, or a dependency changes
def execute_task(task, tools):
plan = planner.create(task)
while not plan.done():
step = plan.next_ready_step()
result = executor.run(step, tools=tools)
plan.record(step_id=step.id, result=result)
if result.requires_replan:
plan = planner.revise(task=task, history=plan.history())
return plan.final_output()
In practice:
- planning works best when the plan schema is typed and small enough to inspect
- risky tasks benefit from explicit checkpoints such as "approval required before write"
- a planner and an executor can use different prompts, models, or tool permissions
The trade-off is clear: Explicit planning improves traceability and long-horizon task quality, but it adds latency, another model decision point, and more orchestration code.
A useful mental model is: Treat the planner like a dispatcher that writes a work order, not like a genius narrator improvising the whole mission.
Use this lens when:
- Use it for tasks with dependencies, branching recovery paths, or multi-system writes.
- Avoid it for short, single-hop tasks where planning overhead costs more than it saves.
Concept 2: Memory Is Selective Persistence, Not Transcript Hoarding
For example, an employee returns the next day and asks, "Did security approve the replacement yet?" The agent should remember the ticket ID, device serial number, and approved shipping address. It should not blindly trust a stale summary from an old chat or replay an entire transcript to recover those facts.
At a high level, Memory is useful only if the system can decide what to retain, how long to trust it, and how to retrieve it without overwhelming the model. Dumping old conversations back into the prompt is not memory architecture. It is context stuffing.
Mechanically: Useful agent memory usually separates at least three layers:
- Working memory
- the current run state: latest messages, retrieved documents, tool outputs, current plan
- short-lived and usually discarded after the task ends
- Episodic memory
- summaries of past interactions or completed cases
- useful for resuming work and avoiding repeated questions
- Semantic memory
- stable facts such as user preferences, system mappings, or policy-linked attributes
- often stored as typed records rather than natural-language summaries
The write path matters as much as retrieval:
- candidate memory is extracted from tool results or final task state
- records are normalized into structured fields
- provenance, timestamps, and expiration rules are attached
- retrieval ranks records by relevance and trust before they reach the model
In practice:
- memory should be queryable and inspectable outside the model
- stale or low-confidence memories need expiration, invalidation, or confirmation prompts
- durable memory creates privacy, retention, and deletion obligations that pure session state does not
The trade-off is clear: Memory improves continuity and personalization, but it can also amplify stale facts, hidden bias, and compliance risk if stored carelessly.
A useful mental model is: A good agent memory system looks more like a small database with retrieval policy than like a diary.
Use this lens when:
- Use it when tasks span sessions, users expect continuity, or repeated facts are expensive to recover.
- Avoid durable memory for low-value chat history that carries more privacy risk than operational benefit.
Concept 3: Multi-Agent Systems Need Role Boundaries and Handoff Contracts
For example, In the stolen-laptop case, one specialist agent may handle identity and access revocation, another may handle procurement policy, and a coordinator may decide which task comes next. That split can reduce prompt overload and isolate permissions. It can also create new failure modes if agents keep asking each other for context that was never structured clearly.
At a high level, A multi-agent system is not automatically better because multiple models are talking. It is useful when roles have meaning: different tools, different authority levels, different evaluation criteria, or different context windows.
Mechanically: A production-safe multi-agent pattern usually includes:
- Coordinator
- owns the global task state
- assigns work to specialists based on plan state or classification
- Specialist agents
- each gets a narrow toolset and prompt focused on one domain
- returns structured outputs instead of open-ended conversation
- Handoff contract
- shared task ID, objective, allowed actions, expected output schema, and timeout budget
- Shared memory or state store
- keeps the canonical record so agents are not relying on each other's paraphrases
The simplest useful pattern is often not "many agents debating." It is "one coordinator routing structured sub-tasks to specialists with clear contracts."
In practice:
- role separation can reduce prompt bloat and prevent over-privileged tool exposure
- coordination needs traces, state versioning, and duplicate-work suppression
- evaluation has to cover the handoff path, not just each agent in isolation
The trade-off is clear: Multi-agent designs can improve specialization and permission isolation, but they add latency, orchestration failure modes, and more surfaces to monitor.
A useful mental model is: Design it like a service architecture. Each agent is a bounded component, not a free-floating personality.
Use this lens when:
- Use it when there are clear specialist roles, policy boundaries, or context partitions that would overload a single agent.
- Avoid it when one well-instrumented agent with explicit tools can already solve the task cleanly.
Troubleshooting
Issue: The planner generates long plans full of speculative steps that the executor never needs.
Why it happens / is confusing: The planner is rewarded for sounding comprehensive instead of producing the minimum executable plan, or the schema does not force clear completion criteria.
Clarification / Fix: Keep plan objects small, require each step to name its dependency and success condition, and re-plan incrementally instead of asking for a perfect end-to-end plan upfront.
Issue: Memory retrieval keeps surfacing stale facts, so the agent acts on outdated information.
Why it happens / is confusing: The system stores natural-language summaries without timestamps, confidence, or invalidation rules, so old conclusions look as trustworthy as fresh facts.
Clarification / Fix: Store memory with provenance and age, prefer typed records for durable facts, and require confirmation before using old memories for high-impact writes.
Issue: Multiple agents bounce a task back and forth or duplicate the same tool calls.
Why it happens / is confusing: Role boundaries are vague, shared state is incomplete, or handoff payloads omit what the next agent actually needs.
Clarification / Fix: Give each agent a narrow mission, maintain a canonical shared task record, and make handoffs structured enough that the receiver can act without re-deriving the whole problem.
Advanced Connections
Connection 1: Advanced Agent Patterns <-> Agent Fundamentals
21/05.md introduced the bounded observe-decide-act loop. This lesson does not replace that loop. It composes it:
- planning decides which loops should happen and in what order
- memory decides what state should survive beyond the current loop
- multi-agent coordination decides which loop owns which part of the task
If the base loop is not typed, budgeted, and observable, advanced patterns only spread instability across more components.
Connection 2: Advanced Agent Patterns <-> Production Agent Systems
21/07.md is the operational consequence of this lesson. Once you add plans, memory stores, and handoffs, you also add:
- more points where unsafe actions must be blocked
- more traces and metrics needed to reconstruct what happened
- more stateful components whose failures can silently degrade quality
Advanced patterns increase capability, but they also expand the safety and observability surface.
Resources
Optional Deepening Resources
-
[PAPER] ReAct: Synergizing Reasoning and Acting in Language Models
- Focus: The reasoning-plus-action loop that underlies both single-agent execution and more structured planner-executor designs.
-
[PAPER] Plan-and-Solve Prompting: Improving Zero-Shot Chain-of-Thought Reasoning by Large Language Models
- Focus: Why explicit decomposition can improve long-horizon task performance and make intermediate reasoning more inspectable.
-
[PAPER] Generative Agents: Interactive Simulacra of Human Behavior
- Focus: A concrete memory architecture with observation, synthesis, and retrieval stages that map well onto production memory design questions.
-
[PAPER] AutoGen: Enabling Next-Gen LLM Applications via Multi-Agent Conversation Framework
- Focus: Multi-agent orchestration patterns, along with the coordination costs they introduce.
Key Insights
- Planning is an execution artifact, not just a prompt trick - the value comes from inspectable steps, budgets, and re-planning rules.
- Memory quality depends on the write path - durable memory needs provenance, structure, and expiration, not just retrieval.
- Multi-agent systems are architecture, not aesthetics - they pay off when roles, permissions, and context boundaries are real enough to justify orchestration overhead.