LESSON
Day 331: ReWOO & Self-Consistency - Planning and Verification
The core idea: ReWOO and self-consistency harden different parts of an LLM workflow. ReWOO makes tool use more explicit by planning evidence collection before execution; self-consistency checks whether multiple sampled reasoning traces converge before the system commits to a sensitive answer.
Today's "Aha!" Moment
21/10.md used Tree of Thoughts to keep several branches alive while Elena's airport-theft incident was still ambiguous. That is useful when the branch choice itself is the problem. But many production systems cannot afford tree search on every ticket, especially when the workflow is mostly known and the real question is either "what evidence should I gather first?" or "do I trust this final decision enough to act?"
That is where this lesson fits. ReWOO and self-consistency are cheaper than full branching, but they do not solve the same problem. ReWOO is about execution structure: write the evidence plan before the tool loop starts, then fill in the placeholders. Self-consistency is about answer reliability: sample several reasoning traces over the same task and see whether they converge on the same conclusion.
Keep the Elena scenario in view. Her managed MacBook was stolen in transit, and the assistant must decide whether to revoke sessions, lock the device, request a wipe, or escalate immediately. The mental hook is simple: ReWOO helps the assistant gather the right facts without thrashing through repeated tool loops; self-consistency helps the assistant decide whether the final action recommendation is robust enough to trust.
Why This Matters
Incident-response assistants live under conflicting pressures. They must move quickly because an active device may still have tokens, VPN access, or cached corporate data. They must also avoid irreversible mistakes because disabling the wrong asset, wiping too early, or skipping an approval gate creates a second incident.
Tree of Thoughts can help when the assistant genuinely needs to compare several plausible playbooks. In practice, though, most teams want something narrower and cheaper. They already know the broad workflow for "stolen managed laptop"; what they need is a way to gather evidence with less wasted reasoning and a way to treat disagreement as a signal instead of burying it inside one polished answer.
ReWOO addresses the first problem. It makes the plan explicit enough that tool use becomes auditable, bounded, and sometimes parallelizable. Self-consistency addresses the second. It turns "the model sounded confident" into a more disciplined question: did multiple reasoning attempts, given the same evidence, land on the same action?
This also prepares the ground for 21/12.md. Once planning, execution, and verification can be separated, the next architectural question is whether one dense model should do every role or whether routing among specialized experts is the better scaling pattern.
Learning Objectives
By the end of this session, you should be able to:
- Explain the difference between planning uncertainty and answer uncertainty in an agent workflow.
- Describe how ReWOO restructures tool use through a planner-worker-solver style execution pattern.
- Decide when self-consistency, ReWOO, or a combination of both is the right response to a production failure mode.
Core Concepts Explained
Concept 1: Self-Consistency Verifies the Recommendation, Not the Retrieval Plan
Suppose Elena has already passed identity verification and the assistant has fetched the key facts: the laptop is still enrolled in MDM, the last heartbeat was six minutes ago, several cloud sessions are active, and the company policy says remote wipe needs security approval unless the device is confirmed offline and high risk. The remaining question is not "which tool should I call next?" It is "given this evidence, what action should I recommend right now?"
Self-consistency treats that question as a sampling problem. Instead of accepting the first reasoning chain, the system draws several independent traces and compares their final answers. In this case, one trace may reason from the approval policy first, another from device state first, and another from session risk first. If most of them converge on "revoke sessions now, lock the device if possible, and open an urgent wipe-approval ticket," that agreement is a useful confidence signal.
The mechanism is simple but easy to misuse. Self-consistency helps when the answer space is relatively crisp and the evidence bundle is already in place. It is strong on tasks like constrained classification, policy interpretation with a small action set, or arithmetic-style problems where multiple reasoning paths can still converge on one correct endpoint.
It does not repair missing evidence. If every sampled chain sees the same incomplete policy excerpt, the system can produce a unanimous wrong answer. It also becomes weaker on open-ended outputs, where two traces may "agree" only because they are both vague. In production, the important move is often not majority vote alone but a routing rule: if the traces disagree on Elena's next action, escalate to a human rather than hiding the disagreement behind one fluent response.
Concept 2: ReWOO Separates Planning From Observation So the Tool Loop Stops Thrashing
Now shift to the part of the problem that happens before the final recommendation exists. A naive agent loop would reason a little, call MDM, reason again, call the identity provider, reason again, fetch policy, reason again, and so on. That can work, but it burns tokens on repeated context rebuilding and makes the execution trace harder to inspect.
ReWOO, short for Reasoning Without Observations, changes the shape of the runtime. The model first writes an execution plan with symbolic evidence slots, then a worker resolves those slots, and only after that does a solver synthesize the recommendation. In Elena's case, the planner might emit something like:
Plan:
- Confirm the incident owner and affected asset.
- Gather device state, active sessions, and stolen-device policy.
- Recommend the least risky immediate containment action.
E1 = directory.lookup_user("Elena")
E2 = mdm.lookup_primary_device(user=E1.user_id)
E3 = idp.list_active_sessions(user=E1.user_id)
E4 = policy.fetch("stolen_managed_laptop")
The worker then executes E1 through E4, often with partial parallelism once dependencies are satisfied. The solver receives the filled evidence map and produces the final answer. At a high level, the flow looks like this:
incident request
|
planner
|
+--> E1 user lookup
+--> E2 device state
+--> E3 active sessions
+--> E4 policy text
|
worker
|
evidence bundle
|
solver
|
action recommendation
This buys concrete things: lower token churn, clearer tool budgets, easier debugging, and an execution artifact an engineer can audit after the fact. The trade-off is that the initial plan can go stale. If E2 reveals the laptop is already unenrolled or legal hold rules apply, the precomputed plan may no longer be adequate. ReWOO is efficient because it assumes a meaningful portion of the workflow can be planned before observations arrive; when that assumption fails, the system must replan or fall back to a more interactive loop.
Concept 3: Use ReWOO for Evidence Flow, Self-Consistency for Decision Confidence
The cleanest way to think about these methods is to place them on opposite sides of a decision boundary. ReWOO helps when the uncertainty lives in the execution structure: which facts are needed, which tools can run independently, and how to keep the agent from repeatedly thinking around the same API calls. Self-consistency helps when the uncertainty lives in the interpretation of the gathered evidence.
In Elena's incident, a practical pipeline could look like this. ReWOO gathers the user record, device state, active sessions, and policy excerpt in one inspectable plan. Then the system runs self-consistency over the final classification prompt: "Given this evidence, is the right immediate action lock, revoke, wipe, or escalate?" If the sampled traces converge, the assistant can respond with higher confidence. If they split between "lock now" and "escalate before any write action," that disagreement becomes a governance signal.
This framing also clarifies when Tree of Thoughts from 21/10.md is still the better tool. If the system truly does not know which strategy family it should pursue, branching search may be worth the cost. If the workflow is already known and the main risks are tool-loop inefficiency or brittle final judgment, ReWOO and self-consistency are usually the more pragmatic choices.
That distinction matters in production because different failure modes deserve different budgets. Search is expensive. Replanning is expensive. Sampling several final traces is expensive too, but usually less so than exploring a tree of partially executed plans. Good agent design starts by locating where the uncertainty actually lives, then paying only for the mechanism that addresses that uncertainty.
Troubleshooting
Issue: "Self-consistency means majority vote is the truth."
Why it happens / is confusing: Agreement feels like evidence, so teams are tempted to collapse the whole method into "pick the most common answer."
Clarification / Fix: Agreement is only meaningful when the sampled traces are diverse enough and the evidence is already adequate. For sensitive actions, use disagreement as a reason to escalate, not as noise to suppress.
Issue: "ReWOO removes the need for replanning."
Why it happens / is confusing: The planner-first structure makes the workflow look fully decided before execution starts.
Clarification / Fix: ReWOO only works well when a stable portion of the plan can be written in advance. Unexpected evidence still requires fallback logic or a second planning pass.
Issue: "If both techniques help reliability, we should run both on every request."
Why it happens / is confusing: Reliability patterns are easy to treat as universal upgrades.
Clarification / Fix: Apply them to the failure mode you actually have. Straightforward tickets may need neither. Tool-heavy but routine flows may benefit from ReWOO alone. High-stakes classification over stable evidence may benefit from self-consistency alone.
Advanced Connections
Connection 1: Self-Consistency ↔ Test-Time Ensembling
Self-consistency is the reasoning-time analogue of an ensemble. Instead of trusting one sampled path, the system looks for convergence across several paths to reduce variance in the final decision.
Connection 2: ReWOO ↔ Query Planning and Workflow Orchestration
ReWOO resembles systems that separate "figure out the dependency graph" from "execute the operators." Database planners, build systems, and incident runbooks all gain efficiency from making the plan explicit before running the expensive steps.
Resources
Optional Deepening Resources
- [PAPER] Self-Consistency Improves Chain of Thought Reasoning in Language Models - Wang et al. (2022)
- Link: https://arxiv.org/abs/2203.11171
- Focus: How sampling multiple reasoning traces and marginalizing to the final answer improves robustness.
- [PAPER] ReWOO: Decoupling Reasoning from Observations for Efficient Augmented Language Models - Xu et al. (2023)
- Link: https://arxiv.org/abs/2305.18323
- Focus: The planner-worker-solver pattern and the efficiency gains from front-loading reasoning.
- [ARTICLE] Plan-and-Execute Agents - LangGraph
- Link: https://langchain-ai.github.io/langgraph/tutorials/plan-and-execute/plan-and-execute/
- Focus: A modern agent implementation that makes planner/executor separation concrete even outside the original ReWOO paper.
Key Insights
- ReWOO and self-consistency protect different layers of the workflow - one stabilizes evidence gathering, the other stabilizes final judgment.
- Consensus is only as good as the evidence behind it - repeated sampling cannot fix missing policy or missing state.
- Disagreement is operationally valuable - in high-stakes systems, it should often trigger escalation rather than be hidden from downstream decision-making.