I Built an AI to Debate Itself So My AI Instructions Don't Bloat
I have a confession: I built an AI system to not only review my AI instructions, but to argue with each other about what should and shouldn't be in there.
I'd been using Claude Code with the sub-agent orchestration feature for a few months when I realized something.
Let me explain how I got here, because the journey from "I should update this file more often" to "let's orchestrate a formal debate between specialized sub-agents" is… well, it's a journey.
The Problem: Context Drift
If you're using Claude Code (or any AI coding assistant), you've probably discovered CLAUDE.md, that special instructions file where you tell Claude about your project structure, conventions, and domain knowledge.
It's incredibly powerful when it's accurate, but it as recent studies have shown it can actually become a hindrance when it contains outdated or stale information. Theo did a great YouTube video covering this concept here: Delete your CLAUDE.md (and your AGENT.md too)
Here's what would typically happen:
- Start a new feature and make a significant update to my project
- Think "I should add this to CLAUDE.md so Claude knows this"
- Get distracted by actual work
- Forget to update it
- Repeat
My CLAUDE.md was missing critical context, leading to repetitive conversations where I'd explain the same project structure or conventions over and over. It also often had stale context, because either the directory structure had changed a bit, or some relevant files had moved.
The Naive Solution: Just Ask Claude
My first thought was simple: "Hey Claude, after we finish this conversation, can you suggest updates to my CLAUDE.md file?"
Guess what happened?
Bloat. Massive, uncontrolled bloat.
Every conversation ended with Claude suggesting 5-10 new sections to add. Here's an example from one session:
Claude suggested adding:
- "When working with components in src/components/ui, always use..."
- "For API routes in src/app/api, remember to..."
- "The build process uses Next.js static exports..."
- "Color palette is defined in tailwind.config.ts..."
- "Error handling should follow this pattern..."
None of the suggestions included removing anything. Within a few iterations, I would have had a 1,000-line instruction file covering every edge case we'd ever discussed.
The problem with asking a single AI agent to improve documentation is the same problem humans have: additive bias. It's psychologically easier to add information than to delete it. We don't want to lose potentially useful context, so we keep stacking it on.
But an instruction file that tries to document everything ends up being too long for Claude to effectively use. You hit context limits, instructions contradict each other, and the signal-to-noise ratio tanks.
The Insight: I Need a Critic, Not Just a Suggester
The breakthrough came when I realized what was missing: adversarial review.
In code reviews, we don't just ask "what else could we add?" We ask "what can we remove?" and "is this really necessary?" That pushback is what keeps codebases maintainable.
That's when I designed the multi-agent debate system.
The Architecture: Orchestrated Debate
Here's how it works when I run /improve-claude-md:
The Three Roles
There are three agents in this system, and I gave each one a specific personality:
The Orchestrator runs the show. It reviews our conversation, reads CLAUDE.md, then spawns the other two agents. Its job is to manage the debate and give me a final report.
The Improver is the optimist. It looks for patterns where Claude got stuck and proposes fixes. Here's the key: it also proposes deletions, which fights that additive bias I mentioned earlier.
The Critic is... well, a critic. It challenges everything. "Is this really needed?" it asks. "Will this still be relevant in two weeks?" It's the adversarial voice that keeps things lean.
The Debate Process
Round 1: Initial Proposals
- Improver suggests 3-5 high-priority changes
- Critic challenges each one: "Is this really necessary?"
Round 2: Defense & Refinement
- Improver responds with evidence from the conversation
- Critic either approves or maintains objections
- Proposals get revised or dropped
Round 3: Final Consensus (if needed)
- Resolve remaining disagreements
- Document any contested proposals
- Agree to disagree if necessary
The Output
After the debate concludes, I get a structured report:
## Recommended Additions
### High Priority
[Critical additions with line counts]
### Medium Priority
[Helpful clarifications with line counts]
## Recommended Removals 🗑️
### High-Impact Deletions
[Existing bloat to remove with rationale]
## Recommended Alternatives
### Commands to Create
[Workflows that should be slash commands, not docs]
## Rejected After Debate
[Proposals discussed but deemed unnecessary]
## Net Impact
- Lines added: +15
- Lines removed: -23
- **Net change: -8 lines**
Notice that last section: net-negative line changes. That's the goal. Better focus through subtraction.
Why Debate > Single Agent
You might be thinking: "Couldn't you just prompt a single agent to be more critical?"
I tried that. It doesn't work as well. Here's why:
1. Role Conflict When a single agent is asked to both propose improvements and critique them, the critique is weak. The agent has already committed to the proposal and suffers from the same confirmation bias humans do.
2. Surface-Level Pushback A single "be critical" prompt produces generic objections: "This might be too specific" or "Consider if this is needed." It's not genuine adversarial review.
3. No Iterative Refinement With two agents, the Improver actually responds to criticism and revises proposals. A single agent just generates a final output without that back-and-forth refinement.
4. Emergent Quality The debate process surfaces insights neither agent would generate alone. The Critic might identify a pattern ("three of these proposals could become one slash command"), which then changes the Improver's approach in the next round.
It's the difference between proofreading your own writing and having someone else review it. The external perspective catches things you can't see.
The Technical Implementation
This is built using Claude Code's custom slash commands and sub-agent system. Here's the key piece: Claude Code lets you spawn specialized sub-agents from within a conversation, give them specific instructions, and then bring their responses back into the main conversation.
Here's the high-level structure of what I built:
File Structure:
~/.claude/commands/improve-claude-md.md # Orchestrator prompt
~/.claude/agents/claude-md-improver/ # Improver agent config
~/.claude/agents/claude-md-critic/ # Critic agent config
The Orchestrator Command (~/.claude/commands/improve-claude-md.md):
You're analyzing our conversation to identify CLAUDE.md improvements.
Process:
1. Read the current CLAUDE.md file (note line count)
2. Review recent conversation (last 20-30 messages)
3. Spawn the Improver agent with context
4. Spawn the Critic agent with the Improver's proposals
5. Manage 2-3 debate rounds until convergence
6. Synthesize final recommendations
7. Present to user (never auto-apply)
Always track line counts and net impact.
The sub-agent configs are much simpler—they just define their focus area. Here's what the Critic looks like:
The Critic Agent (~/.claude/agents/claude-md-critic/AGENT.md):
The Critic is the more complex of the two sub-agents—it's a 244-line evaluation framework that assesses every proposal along six dimensions (necessity, clarity, over-specification risk, unintended consequences, maintainability, conciseness). It demands message citations, validates the 4-question test, and actively pushes for deletions over additions.
The Improver proposes additions AND deletions. The Orchestrator manages the whole debate. Each one owns a different dimension.
The Orchestrator Workflow:
1. Read current CLAUDE.md (note line count)
2. Review recent conversation (last 20-30 messages)
- I trigger this with `/improve-claude-md` after finishing work
- Claude Code passes the conversation context automatically
3. Check project files for context
4. Spawn Improver agent with context
5. Spawn Critic agent with Improver's proposals
6. Manage 2-3 debate rounds
7. Synthesize final recommendations
8. Present to user (never auto-apply)
In practice, this means:
- The Orchestrator reads CLAUDE.md and the conversation
- Spawns Improver to propose changes
- Spawns Critic to challenge those changes
- Manages 2-3 rounds of debate until they converge
- Presents me with final recommendations (never auto-applies)
The key insight: the debate itself is where the quality comes from. Improver and Critic refine each other's thinking in ways neither could achieve alone.
Key Design Decisions:
- Never auto-apply changes: The system only recommends. I approve what goes in.
- Time-boxed debate: Max 3 rounds prevents endless argument
- Convergence failure protocol: If agents can't agree after 3 rounds, both perspectives are presented to me
- Metrics throughout: Line counts, character density, net impact—keeps everyone accountable
What I Learned Building This
1. Automation Isn't Always About Speed This system is slower than just asking Claude to suggest updates. But it produces better results. Sometimes the point of automation is quality control, not throughput.
2. Adversarial Processes Are Underrated We use them in code review, security testing, and debugging. Why not in AI workflows? Having one agent challenge another creates better outcomes than "helpful assistant" mode.
3. Meta-Problems Are Real Problems "Managing AI instructions" sounds silly until your instruction file is 800 lines of contradictory context. Meta-work (work about work) deserves real engineering solutions.
4. The Irony Is Not Lost on Me I built this entire system in a previous conversation with Claude… and then accidentally closed the terminal before documenting it. The very problem this system solves (capturing important decisions before they're lost) is what happened to the original implementation conversation.
The lesson? Ship your documentation system before you need it.
Key Takeaways
What Worked
The adversarial debate catches stuff single-agent systems never would. Improver proposed adding a section about Velite's RSS generation, but Critic challenged it: "The Velite config is already in the repo." Turns out Improver was right—Claude kept asking about it despite the config being available—but Critic forced a one-line version instead of a paragraph.
The net-negative focus is real. Most sessions end with more deletions than additions. My CLAUDE.md is actually getting shorter and more focused over time.
What Changed
My workflow is slower now, but the results are better. I used to just ask Claude "any suggestions for CLAUDE.md?" and get 5-10 additions that I'd half-heartedly implement. Now I run /improve-claude-md, watch the debate play out, and get 2-3 carefully-vetted recommendations with clear rationale.
The key difference: the debate forces evidence. Improver can't just say "this might be helpful"—it has to cite specific message numbers where Claude struggled. Critic can't just say "this seems unnecessary"—it has to explain why the evidence is weak or the instruction is redundant.
What's Next?
I'm considering extending this pattern to other workflows:
- Code review debates (one agent finds issues, another challenges severity)
- Architecture decision records (proposal vs. devil's advocate)
- Documentation quality (writer vs. reader perspective)
The core insight—that AI agents benefit from structured disagreement just like humans do—feels broadly applicable.
Have you built multi-agent systems or workflow automation? I'd love to hear what patterns you've discovered.