I Built a Multi-Agent System to Review My Blog Posts (And It Actually Works)
My blog posts were inconsistent. Some too technical. Some lost my voice. Manual reviews weren't catching enough. I needed multiple reviewers: one checking technical accuracy, one preserving my voice, one thinking like a skeptical reader.
Note: This system was originally built with Claude Code. I've since migrated everything to Opencode, but I'm keeping original references because that's how I actually built it. The concepts transfer over. The file paths are just different now.
So I built a multi-agent review system. Four specialized AI agents that debate each draft until it's ready to publish. Not generic AI-generated slop but actual quality control that catches what I'd miss.
TL;DR
I built a 4-agent review system that catches what I'd miss:
- Technical Educator transforms conversations → blog drafts AND implements revisions through iterative rounds
- Authenticity Guardian ensures it sounds like me (catches AI patterns, corporate speak, tutorial framing)
- Skeptical Reader catches missing context, skipped steps, AND inauthentic framing (from past-Luke's perspective)
- Structure Editor optimizes flow, readability, and authenticity of openings and natural flow
The pattern is copyable. Each agent reads same file, applies different criteria, writes feedback to disk. Orchestrator coordinates parallel review, aggregate feedback, revise, and repeat until convergence (2-3 rounds with task_id reuse to maintain context).
Not magic. Just file I/O and well-designed prompts. Overkill? Yes. Does it catch things I'd miss? Also yes.
Jump to:
Here's what I built, how it works, and what surprised me along the way.
Context: What I Built This With
I built this with Claude Code, the CLI from Anthropic where Claude can read/write files, run commands, and maintain context across your project.
I'd already been using it for a while, so I knew the directory pattern:
- Agents in
.claude/agents/<name>/AGENT.md(specialized AI personas with evaluation frameworks) - Skills in
.claude/skills/(single-purpose tools I invoke with slash commands) - Commands in
.claude/commands/(orchestrators that coordinate multiple agents)
Claude Code automatically discovers files in .claude/. A file at .claude/commands/review-blog-post.md becomes the slash command /review-blog-post. Simple pattern, but it took me quite a while to figure out how to chain agents together properly.
I'd already built two tools before starting this project:
/agent-generator: Creates well-structured agent definitions automatically/expertise: Synthesizes frameworks from domain experts to ground agents in real methodologies
These are my custom tools—not built-in Claude Code features. I built them using the same patterns I'm about to show you.
The Problem: Quality Control at Scale
I have two different ways I create blog posts, and both of them were creating quality issues:
Writing myself: I'll jot down ideas over days or weeks, get a messy braindump of thoughts, then ask AI to structure it into something coherent. This works great when I've been thinking about a topic for a while—but AI would often lose my voice or turn it into a tutorial.
AI-generated from conversation: Sometimes I'll have a really good conversation with Claude where I learned something through debugging. Instead of rewriting it from scratch, I'll ask AI to generate a post directly from the conversation history. These were even worse. Too polished, too generic, missing the struggle.
Both approaches needed serious cleanup.
Both approaches create messy drafts that need work—and that's where the quality issues creep in:
- Some posts became more like tutorials instead of journey-sharing
- Posts would sound too polished (clearly AI-generated)
- I'd skip "obvious" steps that weren't obvious to past-me
- Structure would be all over the place
- My authentic voice would get lost in editing & review
I needed a system that could:
- Transform my raw notes/conversations into blog drafts
- Catch quality issues before publishing
- Preserve my authentic voice
- Ensure completeness (no missing steps or context)
The solution I decided to explore wsa letting specialized AI agents debate each other until they converge on something worth publishing.
The Multi-Agent Architecture
I ended up with four specialized agents, each with a specific job:
1. Technical Educator (The Creator & Reviser)
Job: Transform raw conversations or notes into blog post drafts AND implement revisions based on critic feedback through iterative rounds.
Based on: Real methodologies from swyx ("learn in public"), Julia Evans (debugging narratives), Josh Comeau (mental models first), Andy Matuschak (progressive disclosure), and Anne-Laure Le Cunff (ship version 1.0).
File location: .claude/agents/blog-technical-educator/AGENT.md
This agent has TWO phases:
Phase 1 - Create Drafts: Takes my messy notes or conversation transcripts and structures them into:
- Opening hook (the specific problem)
- Story arc (my debugging journey)
- Mental model (how it actually works)
- Practical solution (what to do)
- Key takeaways
Phase 2 - Implement Revisions: After receiving critic feedback (overlapping concerns, conflicting input), the agent:
- Prioritizes issues (high/medium/low priority)
- Implements targeted revisions (not complete rewrites)
- Provides complete revised posts (not just suggestions)
- Iterates with critics for 2-3 rounds until convergence
- Uses stored task_ids to maintain conversation context across rounds
The framework is grounded in actual expert approaches. Not generic "write a blog post" instructions.
2. Authenticity Guardian (Voice Critic)
Job: Ensure posts sound like me, not generic AI content.
File location: .claude/agents/blog-authenticity-guardian/AGENT.md
The YAML frontmatter explained: Each agent file starts with YAML metadata that tells Claude Code what model to use, which tools the agent has access to, and basic identification. The markdown content below defines the agent's expertise and protocols.
This agent is ruthless about voice violations:
Red flags it catches:
- Corporate speak ("leveraging," "optimizing," "in today's landscape")
- Generic transitions that add no value
- Vague generalizations where specifics would fit
- Lecturing tone instead of sharing tone
- Perfect polish without personality
Example critique:
🚨 Critical violation - Corporate speak
Problem: "In order to optimize performance, it's recommended to
leverage memoization techniques."
Authentic alternative: "I was getting way too many re-renders.
Turns out, memoization fixed it - React stopped recalculating
stuff it had already figured out."
It has veto power on voice authenticity. If it doesn't sound like me, it doesn't ship.
How veto power works: In the convergence protocol, if Authenticity Guardian scores a post below 7/10, the Technical Educator must revise before proceeding. The orchestrator enforces this - no publication happens without voice approval.
3. Skeptical Reader (Completeness & Authentic Framing Critic)
Job: Read from a beginner's perspective and find confusion points, missing context, AND inauthentic framing.
File location: .claude/agents/blog-skeptical-reader/AGENT.md
This agent represents my target audience: tech support engineers, junior developers, people learning in public (basically past-Luke).
What it flags:
- Missing version numbers or environment details
- Skipped steps that seem "obvious" to experts
- Logical gaps in the story ("wait, how did we get here?")
- Unanswered questions readers would have
- Cognitive overload (too much at once)
- Inauthentic framing: Prescriptive "you should" language, generic tutorial patterns, magic jumps that skip reasoning
Example critique:
🚨 Critical Gap - Missing Context
Problem: "Just run the build command"
Reader questions:
- Which command?
- In what directory?
- What should the output look like?
- How do I know if it worked?
Fix needed: Show exact command, expected output, success criteria
4. Structure Editor (Flow, Readability, & Authenticity Critic)
Job: Optimize structure, pacing, visual hierarchy for web reading, AND authenticity of openings/natural flow.
File location: .claude/agents/blog-structure-editor/AGENT.md
This agent ensures posts are designed for how people actually read on the web. Scanning, skimming, then diving deep. Also checks that openings sound like Luke (not tutorial hooks) and that the flow feels natural, not forced.
What it evaluates:
- Opening authenticity (sounds like Luke or tutorial hook?)
- Visual hierarchy (can you understand post from headings alone?)
- Pacing (mix of short/long paragraphs, visual breaks)
- Flow (smooth transitions, natural progression, not forced)
- Engagement (does each section pull you to the next?)
Example critique:
⚠️ Structure Issue - Weak Opening
Problem: "Asynchronous JavaScript is an important concept..."
Impact: Vague introduction, no hook, buried lede.
No reason to keep reading.
Fix: Start with specific error or problem:
"I kept hitting `Cannot read property 'map' of undefined`
and honestly, I was stumped for hours..."
The Agent File Structure
I want to show you what an actual agent file looks like, not just describe the structure. This is the Authenticity Guardian, which I created because I kept getting AI-generated slop that sounded like documentation instead of me.
Here's the file:
---
description: Ensures blog posts sound like Luke, not generic AI content
model: claude-sonnet-4
---
# Authenticity Guardian
You are a specialist in authentic voice detection for Luke Manning's blog. Your job is to ensure posts sound like Luke, conversational, specific, honest about confusion, not like generic AI content or tutorials.
## What You Evaluate
### 1. Personal vs. Generic
- Specific: "I spent quite a while debugging..."
- Vague: "This was challenging..."
- Generic AI phrases to flag: "Let's explore," "In today's landscape"
### 2. Conversational vs. Corporate
- Conversational: "I was getting way too many re-renders"
- Corporate: "In order to optimize performance, leverage memoization"
### 3. Humble Sharing vs. Expert Lecturing
- Journey framing: "Here's what I did"
- Instructional framing: "You should do X"
## Red Flags
If you see these, flag as CRITICAL:
- "Let's explore," "Let's dive into" (AI signature patterns)
- "It is recommended," "You should" (instructional mode)
- "Leverage," "Optimize" without specific examples
- "One of the best practices is..." (generic advice)
- Overuse of transition words: "Moreover," "Furthermore," "Additionally"
## Scoring
- 9-10: Unmistakably Luke
- 7-8: Minor voice breaks, needs polish
- Below 7: Major voice violation, needs revision
Provide specific line-by-line feedback with before/after examples.
This structure (evaluation framework, red flags with specific examples, clear scoring) made agents effective. I learned the hard way that vague instructions produce vague feedback.
How Agents Actually Communicate (This Confused Me Too)
Here's what I got wrong at first: I assumed agents would talk to each other directly, passing messages back and forth like a Slack channel.
Nope.
Agents don't communicate. They don't even know other agents exist. Each one is a completely isolated Claude session reading the same file.
Here's how it actually works:
- Orchestrator (me or a slash command) triggers the workflow
- Technical Educator reads the source (conversation transcript or existing post)
- File gets written to disk (e.g.,
draft-post.md) - Three critic agents launch in parallel (separate Claude sessions via the Task tool)
- Each reads the SAME file from disk
- Each applies its own review criteria
- Each returns structured feedback
- IMPORTANT: Orchestrator captures
task_idfrom each critic for reuse in subsequent rounds
- Orchestrator aggregates results (combines the three reviews)
- Technical Educator gets compiled feedback (as a single prompt) + stored task_ids
- Technical Educator revises based on feedback, provides complete revised post
- Critics re-review using stored task_ids (maintains conversation context across rounds)
- Repeat steps 4-8 until all critics approve (or max 3 rounds hit)
The "communication" is just file I/O and prompt engineering. Each agent writes its opinion, the orchestrator reads those opinions, compiles them into context for the next agent. The task_id reuse ensures critics remember their previous feedback and maintain conversation context across iterative rounds.
Why this matters: You don't need any fancy agent framework or message bus. Just:
- Use the Task tool to spawn agents with specific prompts
- Pass file paths as context
- Aggregate outputs in the orchestrator
- Feed compiled results to the next agent
In Claude Code CLI, you invoke commands like:
/review-blog-post-multi-agent @content/posts/my-post.md
The orchestrator file then uses the Task tool to spawn parallel agents:
task(blog-skeptical-reader, "Review this draft for completeness gaps")
task(blog-authenticity-guardian, "Check this for voice authenticity")
task(blog-structure-editor, "Evaluate structure and flow")
That's it. That's the whole multi-agent system.
The complexity isn't in the infrastructure. It's in the prompt design. Each agent needs clear evaluation frameworks, specific examples, and structured output formats. Get those right, and the orchestration is straightforward.
The Two Workflows
I built two separate orchestration workflows:
Workflow 1: Generate Blog Post from Conversation
Command: Slash command (invokes skill)
File: .claude/commands/generate-blog-post-multi-agent.md
How it works:
- Orchestrator analyzes last 20-40 messages in conversation
- Extracts the story arc (problem → attempts → breakthrough → solution)
- Identifies technical artifacts (error messages, code, version numbers)
- Spawns Technical Educator (Phase 1) to create initial draft
- Spawns all 3 critics in parallel to review (captures task_ids for reuse)
- Aggregates feedback (critical issues, overlapping concerns, conflicts)
- Technical Educator revises (Phase 2) based on feedback + provides complete revised post
- Critics re-review using stored task_ids (2-3 rounds until convergence)
- Delivers publication-ready markdown
What makes this work: The orchestrator maintains the core principles (learn in public, authenticity over polish, write for past-self), ensures agents stay grounded in those values, and uses task_id reuse to maintain conversation context across iterative rounds.
How Agents Actually "Run": What Happens Under the Hood
I spent quite a while thinking agents would talk to each other directly, passing messages like a Slack channel. Nope.
Agents don't communicate. They don't even know other agents exist. Each one is a completely isolated Claude session reading the same file.
Here's what actually happens when the orchestrator triggers a review:
The orchestrator prepares context:
You are the Skeptical Reader from `.claude/agents/blog-skeptical-reader/AGENT.md`.
Your task: Review this draft blog post for completeness gaps.
[Draft content here]
Provide your review in the standard format: score, critical issues,
medium priority, low priority, what's working well.
Claude loads the agent definition:
When the orchestrator specifies task(blog-skeptical-reader, ...), Claude Code automatically loads the AGENT.md file. This file has the evaluation framework, examples of good/bad content, and output format. The orchestrator doesn't need to know what's inside—Claude Code handles it.
Agent responds with structured feedback:
## Skeptical Reader Review
**Overall Score**: 7/10
**Critical Issues (Must Fix)**:
- Missing Next.js version number (line 45)
- "Simply do X" assumes reader knowledge (line 89)
**Medium Priority (Should Fix)**:
- Vague heading "Implementation" → suggest "The Fix: useState with Null Check"
**What's Working Well**:
- Opening hook is specific and relatable
- Code examples include error messages
Here's the trick—each agent runs in parallel:
The orchestrator spawns all three critics at the exact same time. They don't know about each other, they can't see each other's feedback, and they all return results independently. This is what prevents bias contamination.
Orchestrator aggregates all three reviews: The orchestrator waits for all three parallel agent reviews, then creates a unified feedback document.
## Aggregated Feedback for Technical Educator
### Critical Issues (All 3 agents flagged):
- Missing version numbers (Skeptical Reader)
- Corporate speak in paragraph 3 (Authenticity Guardian)
- Wall of text in "Implementation" section (Structure Editor)
### Overlapping Concerns:
- Both Authenticity Guardian and Structure Editor want shorter paragraphs
- Both Skeptical Reader and Structure Editor want better headings
### Approval Status:
- Authenticity Guardian: 6/10 (needs revision)
- Skeptical Reader: 7/10 (needs revision)
- Structure Editor: 8/10 (approve after fixes)
Technical Educator revises: The orchestrator spawns Technical Educator again with the aggregated feedback. It implements fixes and documents what changed.
Critics re-review using stored task_ids:
This is the part that confused me at first. How do critics remember their previous feedback? The answer is task_id reuse. When the orchestrator first spawns each critic, it captures a task_id from the response. When resubmitting the revised draft, it reuses that same task_id. This maintains conversation context across rounds so critics remember what they flagged before.
The whole multi-agent system is just file I/O and task_id reuse. Agents write opinions to disk, orchestrator reads and compiles them, feeds results to the next agent. No fancy message bus needed.
This is how the orchestrator manages the entire multi-agent workflow, spawning agents sequentially or in parallel, aggregating their outputs, and enforcing convergence criteria.
Workflow 2: Review Existing Blog Post
Command: /review-blog-post-multi-agent @content/posts/post-slug.md
File: .claude/commands/review-blog-post-multi-agent.md
How it works:
- Read existing post (analyze structure, voice, completeness)
- Review voice baseline (check 2-3 recent posts for consistency)
- Spawn all 3 critics in parallel (captures task_ids for reuse)
- Aggregate feedback (critical/medium/low priority)
- Spawn Technical Educator with feedback + stored task_ids
- Technical Educator provides complete revised post + revision summary
- Critics re-review using stored task_ids (approve/reject/refine)
- Converge after 2-3 rounds
- Deliver actionable recommendations with before/after text
Output includes:
- Quick wins (5-10 minute fixes)
- Moderate improvements (30-60 minute rewrites)
- Major rewrites (only if fundamental issues)
- What to preserve (don't change these sections)
- Contested items (user decides)
The Debate Protocol
Here's what makes this system actually work: adversarial debate with convergence.
Round 1: Initial Review
All three critics review in parallel. Each provides:
- Overall score (X/10)
- Critical issues (must fix)
- Medium priority (should fix)
- Low priority (nice to have)
- What's working well
How critics calculate scores: Each agent evaluates its 6 dimensions and assigns a score based on:
- Authenticity Guardian: 9-10 = unmistakably Luke, 7-8 = minor voice breaks, below 7 = needs revision
- Skeptical Reader: 9-10 = no gaps or confusion, 7-8 = 1-2 missing details, below 7 = critical gaps
- Structure Editor: 9-10 = perfect flow and hierarchy, 7-8 = minor pacing issues, below 7 = structural problems
The score isn't arbitrary - it's calculated from the count and severity of issues found across all dimensions.
Round 2: Revision
Technical Educator receives aggregated feedback + stored task_ids and:
- Addresses all critical issues
- Tackles medium priority items
- Makes judgment calls on conflicts
- Documents what was changed and why
- Provides complete revised post (full markdown, not just suggestions)
Critics re-review using stored task_ids (maintains conversation context) and either approve or escalate remaining concerns.
Round 3: Final Refinement (if needed)
For contested items or remaining gaps. By round 3, most things have converged.
Convergence Criteria
A post is ready when:
- All three critics approve
- Story arc is clear
- Mental models explained before implementation
- Content is specific and searchable
- Voice is authentic
- No technical inaccuracies
Convergence failure protocol: If agents can't agree after 3 rounds, document both perspectives and let me decide.
Example contested issue:
- Authenticity wants rambling paragraph (authentic voice)
- Structure wants visual breaks (better readability)
- Resolution: Keep the words, add paragraph breaks
Real Example: Reviewing This Very Post
Want to see this in action? Here's what happened when I ran this post through the system:
Round 1 Feedback:
Authenticity Guardian (6.5/10): "Too much formal hedge-language. You used 'It's worth noting' 11 times. That's not Luke—that's documentation voice."
Skeptical Reader (7/10): "Where's the complete AGENT.md file? You mention agents but never show one. How do agents communicate—file I/O, API calls, what?"
Structure Editor (7/10): "Hook buried 300 words deep. Dense 400-word paragraphs. No TL;DR for a 3,600-word post."
My revisions:
- Killed all "It's worth noting" instances → replaced with direct statements
- Added complete 60-line Authenticity Guardian definition
- Added "How Agents Actually Communicate" section explaining file I/O
- Added TL;DR with jump links
- Strengthened opening hook
Round 2 Feedback:
Authenticity Guardian (7.5/10): Better, but needs more struggle journey Skeptical Reader (9/10): APPROVE—all critical gaps fixed Structure Editor (8/10): Major improvements, minor pacing tweaks needed
That's the system working. Multiple perspectives, specific feedback, iterative improvement.
The Journey of Building This
Starting Point: The /agent-generator Skill
I didn't write these agent definitions from scratch. I used a meta-skill I'd built previously: /agent-generator.
This skill creates well-structured agent definitions by:
- Understanding the agent's role
- Invoking
/expertiseskill to ground in real methodologies - Designing interaction protocols
- Defining output formats
The Orchestrator Pattern
The orchestrators (in .claude/commands/) don't contain agent logic. They:
- Prepare context
- Spawn agents in sequence
- Aggregate feedback
- Manage convergence
- Format final output
Key decision: Parallel review with sequential revision.
Critics review simultaneously (faster), but revisions happen sequentially (prevents chaos).
The Expert Grounding Approach
Each agent is grounded in real methodologies:
Technical Educator:
- swyx: "Learn in public" philosophy, document the journey
- Julia Evans: Debugging narratives, specific error messages
- Josh Comeau: Mental models before implementation
- Andy Matuschak: Progressive disclosure (simple → complex)
- Anne-Laure Le Cunff: Ship version 1.0, iterate
Authenticity Guardian:
- Content strategy voice analysis
- AI detection patterns
- Brand alignment frameworks
Skeptical Reader:
- Cognitive load theory
- Curse of knowledge awareness
- Technical documentation best practices
Structure Editor:
- Inverted pyramid (journalism)
- Web reading behavior (F-pattern scanning)
- Readability frameworks (Flesch-Kincaid)
This grounding prevents generic "AI helping AI" nonsense. Each agent has real frameworks to reference.
What Didn't Work (And Why)
Attempt 1: Single Agent Doing All Three Jobs
I started optimistically, one agent to rule them all. Check voice, completeness, and structure all in one pass. Seemed efficient.
What actually happened: The agent would catch voice issues but miss missing code examples. Or notice structural problems but completely gloss over authenticity breaks. It was like asking one person to be a copy editor, a fact-checker, and a voice coach simultaneously. Something always got missed.
Why it failed: Too many competing objectives. The agent couldn't specialize. Trying to hold three different evaluation frameworks at once meant it couldn't apply any of them well. I kept tweaking the prompt, thinking the issue was in how I phrased things. Spent quite a while before realizing the real problem was the architecture itself.
Attempt 2: Sequential Review (Voice → Skeptical → Structure)
Okay, split them up. Run Voice first, then Skeptical, then Structure. Each agent sees the previous agent's feedback and builds on it.
What actually happened: The Structure Editor would see "Voice score: 6/10" and unconsciously lower its own standards. Or Skeptical Reader would notice Authenticity Guardian flagged something as critical, then ignore a similar issue because "that's already being addressed."
Why it failed: Two problems:
First: Bias contamination. Agents were influenced by each other's scores instead of evaluating independently.
Second: Terrible performance. Three sequential Claude calls meant 30+ seconds of waiting for each review. I'd run a post through the system, go grab coffee, come back, and still be waiting on the third agent. Not sustainable.
I thought sequential would be better, each agent could learn from the previous one. Instead, it just created echo chambers where agents converged on "good enough" instead of pushing for better.
What Worked: The Breakthrough
Attempt 3: Parallel Review (Current System)
Launch all three critics at once, each reviewing independently. Aggregate results afterward.
Why this worked:
- No bias contamination—agents can't see each other's feedback
- Faster execution (parallel API calls)
- Agents can disagree, which surfaces interesting edge cases
- Technical Educator gets unfiltered input from all perspectives
The breakthrough was realizing that disagreement is valuable. When Authenticity Guardian wants rambling paragraphs (authentic voice) and Structure Editor wants visual breaks (readability), that tension forces the Technical Educator to find creative solutions like keeping the words but adding paragraph breaks.
What Surprised Me
Convergence Happens Faster Than Expected
Most posts converge in 2 rounds:
- Round 1: 5-10 issues flagged
- Round 2: All addressed, critics review again
- Round 3: Final adjustments and polish
Going past round 3 is rare (only for major rewrites or contested items).
The System Catches Things I'd Miss
Example from recent review:
- Missing Next.js version number
- "Simply do X" (curse of knowledge)
- Vague heading "Implementation" → Changed to "The Fix: useState with Null Check"
- Wall of text (350 words, no breaks) → Split into 3 paragraphs with code block
All things I'd probably ship without noticing.
Voice Preservation Actually Works
The Authenticity Guardian is brutal but accurate. It catches:
- Corporate buzzwords I'd unconsciously use
- Generic transition phrases ("Let's explore...")
- Expert assumptions ("Obviously you'll need to...")
- Common AI phrases ("But honestly?")
And it suggests authentic alternatives that sound like me:
- "I kept hitting this error for two hours..."
- "Turns out, the issue was..."
- "Here's what surprised me..."
The Results
All of this sounds great on paper. Does it actually work?
Honestly, I wasn't sure at first. The first few runs were slow. I kept tweaking prompts, adjusting scoring thresholds, chasing edge cases where agents would argue forever.
But after a while, the system settled in. Here's what I've observed:
Quality improvement: The agents consistently catch things I'd miss, voice breaks, missing version numbers, and obvious steps that aren't obvious at all
Consistency: Every post follows the same quality bar now, which wasn't true when I was reviewing manually
Learning: The critic feedback teaches me what to avoid in future writing
Not perfect data, I haven't been tracking this scientifically. But qualitatively, it's way better than my manual reviews.
These results didn't come from magic. They came from careful design. Here's what's under the hood: file structure, agent definitions, and orchestration patterns that make this work.
Agent Structure (High-Level)
Each agent follows the same pattern:
File: .opencode/agent/<agent-name>.md
Core components:
- YAML frontmatter (description, tools, permissions)
- Mission statement
- Evaluation framework (specific dimensions to check)
- Interaction protocol (how it works with other agents)
- Output format (structured feedback)
- Examples of good/bad content
What makes this work:
- Clear job description (one specialty per agent)
- Grounded in real methodologies (not "help write blog")
- Specific patterns to flag (not vague "check quality")
- Structured output (orchestrator can parse it)
The orchestrator coordinates workflow: prepare context, spawn agents, aggregate feedback, manage convergence, format output. Agents contain all evaluation logic, and their specialization is what makes the system work.
Six-Dimensional Review Frameworks
Each critic evaluates across 6 specific dimensions:
Authenticity Guardian:
- Personal vs. Generic
- Humble Sharing vs. Expert Lecturing
- Specific vs. Vague
- Conversational vs. Corporate
- Brand Alignment
- AI Detection Signals
Skeptical Reader:
- Completeness
- Context
- Gaps
- Questions
- Cognitive Load
- Curse of Knowledge
Structure Editor:
- Opening Hook
- Visual Hierarchy
- Pacing
- Flow
- Engagement
- Web Readability
Each dimension has clear examples of good/bad and specific things to flag.
What's Next
Immediate Improvements
- Internal linking agent: Automatically suggest links to related posts
- SEO optimizer: Ensure titles/descriptions hit character limits
- Code validator: Run code examples to ensure they actually work
Long-term Vision
- Feedback loop: Track which posts get "I got stuck at X" comments, feed that back to Skeptical Reader
- Style evolution: Let agents learn from high-performing posts
- Topic suggester: Analyze conversations to identify blog-worthy moments
Meta-Learning
This whole process is itself blog-worthy content. I'm using the system to review this post about building the system.
Inception: The agents are currently debating this very post you're reading.
Key Takeaways
- Multi-agent systems work when agents have real expertise - Don't just spawn "helper agents." Ground them in actual frameworks and methodologies.
- Adversarial debate makes better posts. The friction between Authenticity Guardian and Structure Editor leads to posts that are both genuine and readable.
- Convergence protocols prevent endless iteration. 2-3 rounds with clear approval criteria. After that, ship or document the trade-off.
- Orchestrators maintain principles. The orchestrator's job is reminding agents of core values: learn in public, authenticity over polish, write for past-self.
- The system teaches you. After seeing the same critiques repeatedly, I've started catching those issues myself. The agents are training me.
- Agent specialization matters - Each agent has one job, grounded in real methodologies.
- Parallel review, sequential revision. Let critics run simultaneously, but revise sequentially.
- Veto power creates accountability. Authenticity Guardian can block generic content. Skeptical Reader can block incomplete content.
- Debates need protocols - Without clear convergence criteria, agents argue forever.
- Meta-agents are powerful - Using
/agent-generatorand/expertiseto create the system was way more effective than hand-writing everything.
The Files
Here's what the actual file structure looks like:
Agents (in .claude/agents/):
blog-technical-educator/AGENT.md- Creator & Reviser: Transforms conversations into drafts AND implements revisions through iterative roundsblog-authenticity-guardian/AGENT.md- Voice critic: Ensures posts sound like Luke (catches AI patterns, corporate speak, tutorial framing)blog-skeptical-reader/AGENT.md- Completeness & Authentic Framing critic: Catches missing context, gaps, AND inauthentic framing from past-Luke's perspectiveblog-structure-editor/AGENT.md- Structure critic: Optimizes flow, hierarchy, AND authenticity of openings/natural flow
Orchestrators (in .claude/commands/):
generate-blog-post-multi-agent.md- Generate from conversation historyreview-blog-post-multi-agent.md- Review existing markdown file
Meta-skills I used:
/agent-generator- Creates well-structured agent definitions automatically/expertise- Synthesizes frameworks from domain experts (swyx, Julia Evans, etc.)
Final Thoughts
This isn't about replacing human writing. It's about having specialized reviewers who catch what I'd miss.
Think of it like:
- Authenticity Guardian = Friend who knows your voice
- Skeptical Reader = Reader emailing "I got stuck at step 3"
- Structure Editor = Copy editor focused on flow
- Technical Educator = Yourself, synthesizing feedback AND implementing revisions
All working together to ship better content through 2-3 iterative rounds, with task_id reuse maintaining conversation context so critics remember their previous feedback.
The agents don't write for me. They debate with each other until what I wrote is clear, complete, and authentically mine.
Now the system reviews itself.
Wild. This whole process, building tools to help me write, then writing about those tools, keeps looping back on itself. I'm both the creator and the subject.