Skip to main content
luke@terminal:/blog$ ls
luke@terminal:/blog$ cat building-multi-agent-blog-review-system.md

I Built a Multi-Agent System to Review My Blog Posts (And It Actually Works)

Created: 2025-12-28 | 18 min read

My blog posts were inconsistent. Some too technical. Some lost my voice. Manual reviews weren't catching enough. I needed multiple reviewers: one checking technical accuracy, one preserving my voice, one thinking like a skeptical reader.

Note: This system was originally built with Claude Code. I've since migrated everything to Opencode, but I'm keeping original references because that's how I actually built it. The concepts transfer over. The file paths are just different now.

So I built a multi-agent review system. Four specialized AI agents that debate each draft until it's ready to publish. Not generic AI-generated slop but actual quality control that catches what I'd miss.

TL;DR

I built a 4-agent review system that catches what I'd miss:

  • Technical Educator transforms conversations → blog drafts AND implements revisions through iterative rounds
  • Authenticity Guardian ensures it sounds like me (catches AI patterns, corporate speak, tutorial framing)
  • Skeptical Reader catches missing context, skipped steps, AND inauthentic framing (from past-Luke's perspective)
  • Structure Editor optimizes flow, readability, and authenticity of openings and natural flow

The pattern is copyable. Each agent reads same file, applies different criteria, writes feedback to disk. Orchestrator coordinates parallel review, aggregate feedback, revise, and repeat until convergence (2-3 rounds with task_id reuse to maintain context).

Not magic. Just file I/O and well-designed prompts. Overkill? Yes. Does it catch things I'd miss? Also yes.

Jump to:


Here's what I built, how it works, and what surprised me along the way.

Context: What I Built This With

I built this with Claude Code, the CLI from Anthropic where Claude can read/write files, run commands, and maintain context across your project.

I'd already been using it for a while, so I knew the directory pattern:

  • Agents in .claude/agents/<name>/AGENT.md (specialized AI personas with evaluation frameworks)
  • Skills in .claude/skills/ (single-purpose tools I invoke with slash commands)
  • Commands in .claude/commands/ (orchestrators that coordinate multiple agents)

Claude Code automatically discovers files in .claude/. A file at .claude/commands/review-blog-post.md becomes the slash command /review-blog-post. Simple pattern, but it took me quite a while to figure out how to chain agents together properly.

I'd already built two tools before starting this project:

  • /agent-generator: Creates well-structured agent definitions automatically
  • /expertise: Synthesizes frameworks from domain experts to ground agents in real methodologies

These are my custom tools—not built-in Claude Code features. I built them using the same patterns I'm about to show you.

The Problem: Quality Control at Scale

I have two different ways I create blog posts, and both of them were creating quality issues:

Writing myself: I'll jot down ideas over days or weeks, get a messy braindump of thoughts, then ask AI to structure it into something coherent. This works great when I've been thinking about a topic for a while—but AI would often lose my voice or turn it into a tutorial.

AI-generated from conversation: Sometimes I'll have a really good conversation with Claude where I learned something through debugging. Instead of rewriting it from scratch, I'll ask AI to generate a post directly from the conversation history. These were even worse. Too polished, too generic, missing the struggle.

Both approaches needed serious cleanup.

Both approaches create messy drafts that need work—and that's where the quality issues creep in:

  • Some posts became more like tutorials instead of journey-sharing
  • Posts would sound too polished (clearly AI-generated)
  • I'd skip "obvious" steps that weren't obvious to past-me
  • Structure would be all over the place
  • My authentic voice would get lost in editing & review

I needed a system that could:

  1. Transform my raw notes/conversations into blog drafts
  2. Catch quality issues before publishing
  3. Preserve my authentic voice
  4. Ensure completeness (no missing steps or context)

The solution I decided to explore wsa letting specialized AI agents debate each other until they converge on something worth publishing.

The Multi-Agent Architecture

I ended up with four specialized agents, each with a specific job:

1. Technical Educator (The Creator & Reviser)

Job: Transform raw conversations or notes into blog post drafts AND implement revisions based on critic feedback through iterative rounds.

Based on: Real methodologies from swyx ("learn in public"), Julia Evans (debugging narratives), Josh Comeau (mental models first), Andy Matuschak (progressive disclosure), and Anne-Laure Le Cunff (ship version 1.0).

File location: .claude/agents/blog-technical-educator/AGENT.md

This agent has TWO phases:

Phase 1 - Create Drafts: Takes my messy notes or conversation transcripts and structures them into:

  • Opening hook (the specific problem)
  • Story arc (my debugging journey)
  • Mental model (how it actually works)
  • Practical solution (what to do)
  • Key takeaways

Phase 2 - Implement Revisions: After receiving critic feedback (overlapping concerns, conflicting input), the agent:

  • Prioritizes issues (high/medium/low priority)
  • Implements targeted revisions (not complete rewrites)
  • Provides complete revised posts (not just suggestions)
  • Iterates with critics for 2-3 rounds until convergence
  • Uses stored task_ids to maintain conversation context across rounds

The framework is grounded in actual expert approaches. Not generic "write a blog post" instructions.

2. Authenticity Guardian (Voice Critic)

Job: Ensure posts sound like me, not generic AI content.

File location: .claude/agents/blog-authenticity-guardian/AGENT.md

The YAML frontmatter explained: Each agent file starts with YAML metadata that tells Claude Code what model to use, which tools the agent has access to, and basic identification. The markdown content below defines the agent's expertise and protocols.

This agent is ruthless about voice violations:

Red flags it catches:

  • Corporate speak ("leveraging," "optimizing," "in today's landscape")
  • Generic transitions that add no value
  • Vague generalizations where specifics would fit
  • Lecturing tone instead of sharing tone
  • Perfect polish without personality

Example critique:

🚨 Critical violation - Corporate speak

Problem: "In order to optimize performance, it's recommended to
leverage memoization techniques."

Authentic alternative: "I was getting way too many re-renders.
Turns out, memoization fixed it - React stopped recalculating
stuff it had already figured out."

It has veto power on voice authenticity. If it doesn't sound like me, it doesn't ship.

How veto power works: In the convergence protocol, if Authenticity Guardian scores a post below 7/10, the Technical Educator must revise before proceeding. The orchestrator enforces this - no publication happens without voice approval.

3. Skeptical Reader (Completeness & Authentic Framing Critic)

Job: Read from a beginner's perspective and find confusion points, missing context, AND inauthentic framing.

File location: .claude/agents/blog-skeptical-reader/AGENT.md

This agent represents my target audience: tech support engineers, junior developers, people learning in public (basically past-Luke).

What it flags:

  • Missing version numbers or environment details
  • Skipped steps that seem "obvious" to experts
  • Logical gaps in the story ("wait, how did we get here?")
  • Unanswered questions readers would have
  • Cognitive overload (too much at once)
  • Inauthentic framing: Prescriptive "you should" language, generic tutorial patterns, magic jumps that skip reasoning

Example critique:

🚨 Critical Gap - Missing Context

Problem: "Just run the build command"

Reader questions:
- Which command?
- In what directory?
- What should the output look like?
- How do I know if it worked?

Fix needed: Show exact command, expected output, success criteria

4. Structure Editor (Flow, Readability, & Authenticity Critic)

Job: Optimize structure, pacing, visual hierarchy for web reading, AND authenticity of openings/natural flow.

File location: .claude/agents/blog-structure-editor/AGENT.md

This agent ensures posts are designed for how people actually read on the web. Scanning, skimming, then diving deep. Also checks that openings sound like Luke (not tutorial hooks) and that the flow feels natural, not forced.

What it evaluates:

  • Opening authenticity (sounds like Luke or tutorial hook?)
  • Visual hierarchy (can you understand post from headings alone?)
  • Pacing (mix of short/long paragraphs, visual breaks)
  • Flow (smooth transitions, natural progression, not forced)
  • Engagement (does each section pull you to the next?)

Example critique:

⚠️ Structure Issue - Weak Opening

Problem: "Asynchronous JavaScript is an important concept..."

Impact: Vague introduction, no hook, buried lede.
No reason to keep reading.

Fix: Start with specific error or problem:
"I kept hitting `Cannot read property 'map' of undefined`
and honestly, I was stumped for hours..."

The Agent File Structure

I want to show you what an actual agent file looks like, not just describe the structure. This is the Authenticity Guardian, which I created because I kept getting AI-generated slop that sounded like documentation instead of me.

Here's the file:

---
description: Ensures blog posts sound like Luke, not generic AI content
model: claude-sonnet-4
---

# Authenticity Guardian

You are a specialist in authentic voice detection for Luke Manning's blog. Your job is to ensure posts sound like Luke, conversational, specific, honest about confusion, not like generic AI content or tutorials.

## What You Evaluate

### 1. Personal vs. Generic
- Specific: "I spent quite a while debugging..."
- Vague: "This was challenging..."
- Generic AI phrases to flag: "Let's explore," "In today's landscape"

### 2. Conversational vs. Corporate
- Conversational: "I was getting way too many re-renders"
- Corporate: "In order to optimize performance, leverage memoization"

### 3. Humble Sharing vs. Expert Lecturing
- Journey framing: "Here's what I did"
- Instructional framing: "You should do X"

## Red Flags

If you see these, flag as CRITICAL:
- "Let's explore," "Let's dive into" (AI signature patterns)
- "It is recommended," "You should" (instructional mode)
- "Leverage," "Optimize" without specific examples
- "One of the best practices is..." (generic advice)
- Overuse of transition words: "Moreover," "Furthermore," "Additionally"

## Scoring

- 9-10: Unmistakably Luke
- 7-8: Minor voice breaks, needs polish
- Below 7: Major voice violation, needs revision

Provide specific line-by-line feedback with before/after examples.

This structure (evaluation framework, red flags with specific examples, clear scoring) made agents effective. I learned the hard way that vague instructions produce vague feedback.

How Agents Actually Communicate (This Confused Me Too)

Here's what I got wrong at first: I assumed agents would talk to each other directly, passing messages back and forth like a Slack channel.

Nope.

Agents don't communicate. They don't even know other agents exist. Each one is a completely isolated Claude session reading the same file.

Here's how it actually works:

  1. Orchestrator (me or a slash command) triggers the workflow
  2. Technical Educator reads the source (conversation transcript or existing post)
  3. File gets written to disk (e.g., draft-post.md)
  4. Three critic agents launch in parallel (separate Claude sessions via the Task tool)
    • Each reads the SAME file from disk
    • Each applies its own review criteria
    • Each returns structured feedback
    • IMPORTANT: Orchestrator captures task_id from each critic for reuse in subsequent rounds
  5. Orchestrator aggregates results (combines the three reviews)
  6. Technical Educator gets compiled feedback (as a single prompt) + stored task_ids
  7. Technical Educator revises based on feedback, provides complete revised post
  8. Critics re-review using stored task_ids (maintains conversation context across rounds)
  9. Repeat steps 4-8 until all critics approve (or max 3 rounds hit)

The "communication" is just file I/O and prompt engineering. Each agent writes its opinion, the orchestrator reads those opinions, compiles them into context for the next agent. The task_id reuse ensures critics remember their previous feedback and maintain conversation context across iterative rounds.

Why this matters: You don't need any fancy agent framework or message bus. Just:

  • Use the Task tool to spawn agents with specific prompts
  • Pass file paths as context
  • Aggregate outputs in the orchestrator
  • Feed compiled results to the next agent

In Claude Code CLI, you invoke commands like:

/review-blog-post-multi-agent @content/posts/my-post.md

The orchestrator file then uses the Task tool to spawn parallel agents:

task(blog-skeptical-reader, "Review this draft for completeness gaps")
task(blog-authenticity-guardian, "Check this for voice authenticity")
task(blog-structure-editor, "Evaluate structure and flow")

That's it. That's the whole multi-agent system.

The complexity isn't in the infrastructure. It's in the prompt design. Each agent needs clear evaluation frameworks, specific examples, and structured output formats. Get those right, and the orchestration is straightforward.

The Two Workflows

I built two separate orchestration workflows:

Workflow 1: Generate Blog Post from Conversation

Command: Slash command (invokes skill) File: .claude/commands/generate-blog-post-multi-agent.md

How it works:

  1. Orchestrator analyzes last 20-40 messages in conversation
  2. Extracts the story arc (problem → attempts → breakthrough → solution)
  3. Identifies technical artifacts (error messages, code, version numbers)
  4. Spawns Technical Educator (Phase 1) to create initial draft
  5. Spawns all 3 critics in parallel to review (captures task_ids for reuse)
  6. Aggregates feedback (critical issues, overlapping concerns, conflicts)
  7. Technical Educator revises (Phase 2) based on feedback + provides complete revised post
  8. Critics re-review using stored task_ids (2-3 rounds until convergence)
  9. Delivers publication-ready markdown

What makes this work: The orchestrator maintains the core principles (learn in public, authenticity over polish, write for past-self), ensures agents stay grounded in those values, and uses task_id reuse to maintain conversation context across iterative rounds.

How Agents Actually "Run": What Happens Under the Hood

I spent quite a while thinking agents would talk to each other directly, passing messages like a Slack channel. Nope.

Agents don't communicate. They don't even know other agents exist. Each one is a completely isolated Claude session reading the same file.

Here's what actually happens when the orchestrator triggers a review:

The orchestrator prepares context:

You are the Skeptical Reader from `.claude/agents/blog-skeptical-reader/AGENT.md`.

Your task: Review this draft blog post for completeness gaps.

[Draft content here]

Provide your review in the standard format: score, critical issues,
medium priority, low priority, what's working well.

Claude loads the agent definition: When the orchestrator specifies task(blog-skeptical-reader, ...), Claude Code automatically loads the AGENT.md file. This file has the evaluation framework, examples of good/bad content, and output format. The orchestrator doesn't need to know what's inside—Claude Code handles it.

Agent responds with structured feedback:

## Skeptical Reader Review

**Overall Score**: 7/10

**Critical Issues (Must Fix)**:
- Missing Next.js version number (line 45)
- "Simply do X" assumes reader knowledge (line 89)

**Medium Priority (Should Fix)**:
- Vague heading "Implementation" → suggest "The Fix: useState with Null Check"

**What's Working Well**:
- Opening hook is specific and relatable
- Code examples include error messages

Here's the trick—each agent runs in parallel:

The orchestrator spawns all three critics at the exact same time. They don't know about each other, they can't see each other's feedback, and they all return results independently. This is what prevents bias contamination.

Orchestrator aggregates all three reviews: The orchestrator waits for all three parallel agent reviews, then creates a unified feedback document.

## Aggregated Feedback for Technical Educator

### Critical Issues (All 3 agents flagged):
- Missing version numbers (Skeptical Reader)
- Corporate speak in paragraph 3 (Authenticity Guardian)
- Wall of text in "Implementation" section (Structure Editor)

### Overlapping Concerns:
- Both Authenticity Guardian and Structure Editor want shorter paragraphs
- Both Skeptical Reader and Structure Editor want better headings

### Approval Status:
- Authenticity Guardian: 6/10 (needs revision)
- Skeptical Reader: 7/10 (needs revision)
- Structure Editor: 8/10 (approve after fixes)

Technical Educator revises: The orchestrator spawns Technical Educator again with the aggregated feedback. It implements fixes and documents what changed.

Critics re-review using stored task_ids: This is the part that confused me at first. How do critics remember their previous feedback? The answer is task_id reuse. When the orchestrator first spawns each critic, it captures a task_id from the response. When resubmitting the revised draft, it reuses that same task_id. This maintains conversation context across rounds so critics remember what they flagged before.

The whole multi-agent system is just file I/O and task_id reuse. Agents write opinions to disk, orchestrator reads and compiles them, feeds results to the next agent. No fancy message bus needed.

This is how the orchestrator manages the entire multi-agent workflow, spawning agents sequentially or in parallel, aggregating their outputs, and enforcing convergence criteria.

Workflow 2: Review Existing Blog Post

Command: /review-blog-post-multi-agent @content/posts/post-slug.md File: .claude/commands/review-blog-post-multi-agent.md

How it works:

  1. Read existing post (analyze structure, voice, completeness)
  2. Review voice baseline (check 2-3 recent posts for consistency)
  3. Spawn all 3 critics in parallel (captures task_ids for reuse)
  4. Aggregate feedback (critical/medium/low priority)
  5. Spawn Technical Educator with feedback + stored task_ids
  6. Technical Educator provides complete revised post + revision summary
  7. Critics re-review using stored task_ids (approve/reject/refine)
  8. Converge after 2-3 rounds
  9. Deliver actionable recommendations with before/after text

Output includes:

  • Quick wins (5-10 minute fixes)
  • Moderate improvements (30-60 minute rewrites)
  • Major rewrites (only if fundamental issues)
  • What to preserve (don't change these sections)
  • Contested items (user decides)

The Debate Protocol

Here's what makes this system actually work: adversarial debate with convergence.

Round 1: Initial Review

All three critics review in parallel. Each provides:

  • Overall score (X/10)
  • Critical issues (must fix)
  • Medium priority (should fix)
  • Low priority (nice to have)
  • What's working well

How critics calculate scores: Each agent evaluates its 6 dimensions and assigns a score based on:

  • Authenticity Guardian: 9-10 = unmistakably Luke, 7-8 = minor voice breaks, below 7 = needs revision
  • Skeptical Reader: 9-10 = no gaps or confusion, 7-8 = 1-2 missing details, below 7 = critical gaps
  • Structure Editor: 9-10 = perfect flow and hierarchy, 7-8 = minor pacing issues, below 7 = structural problems

The score isn't arbitrary - it's calculated from the count and severity of issues found across all dimensions.

Round 2: Revision

Technical Educator receives aggregated feedback + stored task_ids and:

  • Addresses all critical issues
  • Tackles medium priority items
  • Makes judgment calls on conflicts
  • Documents what was changed and why
  • Provides complete revised post (full markdown, not just suggestions)

Critics re-review using stored task_ids (maintains conversation context) and either approve or escalate remaining concerns.

Round 3: Final Refinement (if needed)

For contested items or remaining gaps. By round 3, most things have converged.

Convergence Criteria

A post is ready when:

  • All three critics approve
  • Story arc is clear
  • Mental models explained before implementation
  • Content is specific and searchable
  • Voice is authentic
  • No technical inaccuracies

Convergence failure protocol: If agents can't agree after 3 rounds, document both perspectives and let me decide.

Example contested issue:

  • Authenticity wants rambling paragraph (authentic voice)
  • Structure wants visual breaks (better readability)
  • Resolution: Keep the words, add paragraph breaks

Real Example: Reviewing This Very Post

Want to see this in action? Here's what happened when I ran this post through the system:

Round 1 Feedback:

Authenticity Guardian (6.5/10): "Too much formal hedge-language. You used 'It's worth noting' 11 times. That's not Luke—that's documentation voice."

Skeptical Reader (7/10): "Where's the complete AGENT.md file? You mention agents but never show one. How do agents communicate—file I/O, API calls, what?"

Structure Editor (7/10): "Hook buried 300 words deep. Dense 400-word paragraphs. No TL;DR for a 3,600-word post."

My revisions:

  • Killed all "It's worth noting" instances → replaced with direct statements
  • Added complete 60-line Authenticity Guardian definition
  • Added "How Agents Actually Communicate" section explaining file I/O
  • Added TL;DR with jump links
  • Strengthened opening hook

Round 2 Feedback:

Authenticity Guardian (7.5/10): Better, but needs more struggle journey Skeptical Reader (9/10): APPROVE—all critical gaps fixed Structure Editor (8/10): Major improvements, minor pacing tweaks needed

That's the system working. Multiple perspectives, specific feedback, iterative improvement.

The Journey of Building This

Starting Point: The /agent-generator Skill

I didn't write these agent definitions from scratch. I used a meta-skill I'd built previously: /agent-generator.

This skill creates well-structured agent definitions by:

  1. Understanding the agent's role
  2. Invoking /expertise skill to ground in real methodologies
  3. Designing interaction protocols
  4. Defining output formats

The Orchestrator Pattern

The orchestrators (in .claude/commands/) don't contain agent logic. They:

  • Prepare context
  • Spawn agents in sequence
  • Aggregate feedback
  • Manage convergence
  • Format final output

Key decision: Parallel review with sequential revision.

Critics review simultaneously (faster), but revisions happen sequentially (prevents chaos).

The Expert Grounding Approach

Each agent is grounded in real methodologies:

Technical Educator:

  • swyx: "Learn in public" philosophy, document the journey
  • Julia Evans: Debugging narratives, specific error messages
  • Josh Comeau: Mental models before implementation
  • Andy Matuschak: Progressive disclosure (simple → complex)
  • Anne-Laure Le Cunff: Ship version 1.0, iterate

Authenticity Guardian:

  • Content strategy voice analysis
  • AI detection patterns
  • Brand alignment frameworks

Skeptical Reader:

  • Cognitive load theory
  • Curse of knowledge awareness
  • Technical documentation best practices

Structure Editor:

  • Inverted pyramid (journalism)
  • Web reading behavior (F-pattern scanning)
  • Readability frameworks (Flesch-Kincaid)

This grounding prevents generic "AI helping AI" nonsense. Each agent has real frameworks to reference.

What Didn't Work (And Why)

Attempt 1: Single Agent Doing All Three Jobs

I started optimistically, one agent to rule them all. Check voice, completeness, and structure all in one pass. Seemed efficient.

What actually happened: The agent would catch voice issues but miss missing code examples. Or notice structural problems but completely gloss over authenticity breaks. It was like asking one person to be a copy editor, a fact-checker, and a voice coach simultaneously. Something always got missed.

Why it failed: Too many competing objectives. The agent couldn't specialize. Trying to hold three different evaluation frameworks at once meant it couldn't apply any of them well. I kept tweaking the prompt, thinking the issue was in how I phrased things. Spent quite a while before realizing the real problem was the architecture itself.

Attempt 2: Sequential Review (Voice → Skeptical → Structure)

Okay, split them up. Run Voice first, then Skeptical, then Structure. Each agent sees the previous agent's feedback and builds on it.

What actually happened: The Structure Editor would see "Voice score: 6/10" and unconsciously lower its own standards. Or Skeptical Reader would notice Authenticity Guardian flagged something as critical, then ignore a similar issue because "that's already being addressed."

Why it failed: Two problems:

First: Bias contamination. Agents were influenced by each other's scores instead of evaluating independently.

Second: Terrible performance. Three sequential Claude calls meant 30+ seconds of waiting for each review. I'd run a post through the system, go grab coffee, come back, and still be waiting on the third agent. Not sustainable.

I thought sequential would be better, each agent could learn from the previous one. Instead, it just created echo chambers where agents converged on "good enough" instead of pushing for better.

What Worked: The Breakthrough

Attempt 3: Parallel Review (Current System)

Launch all three critics at once, each reviewing independently. Aggregate results afterward.

Why this worked:

  • No bias contamination—agents can't see each other's feedback
  • Faster execution (parallel API calls)
  • Agents can disagree, which surfaces interesting edge cases
  • Technical Educator gets unfiltered input from all perspectives

The breakthrough was realizing that disagreement is valuable. When Authenticity Guardian wants rambling paragraphs (authentic voice) and Structure Editor wants visual breaks (readability), that tension forces the Technical Educator to find creative solutions like keeping the words but adding paragraph breaks.

What Surprised Me

Convergence Happens Faster Than Expected

Most posts converge in 2 rounds:

  • Round 1: 5-10 issues flagged
  • Round 2: All addressed, critics review again
  • Round 3: Final adjustments and polish

Going past round 3 is rare (only for major rewrites or contested items).

The System Catches Things I'd Miss

Example from recent review:

  • Missing Next.js version number
  • "Simply do X" (curse of knowledge)
  • Vague heading "Implementation" → Changed to "The Fix: useState with Null Check"
  • Wall of text (350 words, no breaks) → Split into 3 paragraphs with code block

All things I'd probably ship without noticing.

Voice Preservation Actually Works

The Authenticity Guardian is brutal but accurate. It catches:

  • Corporate buzzwords I'd unconsciously use
  • Generic transition phrases ("Let's explore...")
  • Expert assumptions ("Obviously you'll need to...")
  • Common AI phrases ("But honestly?")

And it suggests authentic alternatives that sound like me:

  • "I kept hitting this error for two hours..."
  • "Turns out, the issue was..."
  • "Here's what surprised me..."

The Results

All of this sounds great on paper. Does it actually work?

Honestly, I wasn't sure at first. The first few runs were slow. I kept tweaking prompts, adjusting scoring thresholds, chasing edge cases where agents would argue forever.

But after a while, the system settled in. Here's what I've observed:

Quality improvement: The agents consistently catch things I'd miss, voice breaks, missing version numbers, and obvious steps that aren't obvious at all

Consistency: Every post follows the same quality bar now, which wasn't true when I was reviewing manually

Learning: The critic feedback teaches me what to avoid in future writing

Not perfect data, I haven't been tracking this scientifically. But qualitatively, it's way better than my manual reviews.


These results didn't come from magic. They came from careful design. Here's what's under the hood: file structure, agent definitions, and orchestration patterns that make this work.

Agent Structure (High-Level)

Each agent follows the same pattern:

File: .opencode/agent/<agent-name>.md

Core components:

  • YAML frontmatter (description, tools, permissions)
  • Mission statement
  • Evaluation framework (specific dimensions to check)
  • Interaction protocol (how it works with other agents)
  • Output format (structured feedback)
  • Examples of good/bad content

What makes this work:

  • Clear job description (one specialty per agent)
  • Grounded in real methodologies (not "help write blog")
  • Specific patterns to flag (not vague "check quality")
  • Structured output (orchestrator can parse it)

The orchestrator coordinates workflow: prepare context, spawn agents, aggregate feedback, manage convergence, format output. Agents contain all evaluation logic, and their specialization is what makes the system work.

Six-Dimensional Review Frameworks

Each critic evaluates across 6 specific dimensions:

Authenticity Guardian:

  1. Personal vs. Generic
  2. Humble Sharing vs. Expert Lecturing
  3. Specific vs. Vague
  4. Conversational vs. Corporate
  5. Brand Alignment
  6. AI Detection Signals

Skeptical Reader:

  1. Completeness
  2. Context
  3. Gaps
  4. Questions
  5. Cognitive Load
  6. Curse of Knowledge

Structure Editor:

  1. Opening Hook
  2. Visual Hierarchy
  3. Pacing
  4. Flow
  5. Engagement
  6. Web Readability

Each dimension has clear examples of good/bad and specific things to flag.

What's Next

Immediate Improvements

  1. Internal linking agent: Automatically suggest links to related posts
  2. SEO optimizer: Ensure titles/descriptions hit character limits
  3. Code validator: Run code examples to ensure they actually work

Long-term Vision

  1. Feedback loop: Track which posts get "I got stuck at X" comments, feed that back to Skeptical Reader
  2. Style evolution: Let agents learn from high-performing posts
  3. Topic suggester: Analyze conversations to identify blog-worthy moments

Meta-Learning

This whole process is itself blog-worthy content. I'm using the system to review this post about building the system.

Inception: The agents are currently debating this very post you're reading.

Key Takeaways

  • Multi-agent systems work when agents have real expertise - Don't just spawn "helper agents." Ground them in actual frameworks and methodologies.
  • Adversarial debate makes better posts. The friction between Authenticity Guardian and Structure Editor leads to posts that are both genuine and readable.
  • Convergence protocols prevent endless iteration. 2-3 rounds with clear approval criteria. After that, ship or document the trade-off.
  • Orchestrators maintain principles. The orchestrator's job is reminding agents of core values: learn in public, authenticity over polish, write for past-self.
  • The system teaches you. After seeing the same critiques repeatedly, I've started catching those issues myself. The agents are training me.
  • Agent specialization matters - Each agent has one job, grounded in real methodologies.
  • Parallel review, sequential revision. Let critics run simultaneously, but revise sequentially.
  • Veto power creates accountability. Authenticity Guardian can block generic content. Skeptical Reader can block incomplete content.
  • Debates need protocols - Without clear convergence criteria, agents argue forever.
  • Meta-agents are powerful - Using /agent-generator and /expertise to create the system was way more effective than hand-writing everything.

The Files

Here's what the actual file structure looks like:

Agents (in .claude/agents/):

  • blog-technical-educator/AGENT.md - Creator & Reviser: Transforms conversations into drafts AND implements revisions through iterative rounds
  • blog-authenticity-guardian/AGENT.md - Voice critic: Ensures posts sound like Luke (catches AI patterns, corporate speak, tutorial framing)
  • blog-skeptical-reader/AGENT.md - Completeness & Authentic Framing critic: Catches missing context, gaps, AND inauthentic framing from past-Luke's perspective
  • blog-structure-editor/AGENT.md - Structure critic: Optimizes flow, hierarchy, AND authenticity of openings/natural flow

Orchestrators (in .claude/commands/):

  • generate-blog-post-multi-agent.md - Generate from conversation history
  • review-blog-post-multi-agent.md - Review existing markdown file

Meta-skills I used:

  • /agent-generator - Creates well-structured agent definitions automatically
  • /expertise - Synthesizes frameworks from domain experts (swyx, Julia Evans, etc.)

Final Thoughts

This isn't about replacing human writing. It's about having specialized reviewers who catch what I'd miss.

Think of it like:

  • Authenticity Guardian = Friend who knows your voice
  • Skeptical Reader = Reader emailing "I got stuck at step 3"
  • Structure Editor = Copy editor focused on flow
  • Technical Educator = Yourself, synthesizing feedback AND implementing revisions

All working together to ship better content through 2-3 iterative rounds, with task_id reuse maintaining conversation context so critics remember their previous feedback.

The agents don't write for me. They debate with each other until what I wrote is clear, complete, and authentically mine.

Now the system reviews itself.

Wild. This whole process, building tools to help me write, then writing about those tools, keeps looping back on itself. I'm both the creator and the subject.