Skill126 repo starsupdated 2d ago

plan-swarm-review

View source Repository: claude-code-config

Install in Claude Code

Copy

git clone --depth 1 https://github.com/AnastasiyaW/claude-code-config /tmp/plan-swarm-review && cp -r /tmp/plan-swarm-review/skills/architecture/plan-swarm-review ~/.claude/skills/plan-swarm-review

Then start a new Claude Code session; the skill loads automatically.

Definition

SKILL.md

# Plan Swarm Review

Iterative plan hardening through multisampling and focused decomposition.

**Core insight**: a single agent misses issues due to attention budget limits.
Multiple independent agents reading the same document find different problems
(stochastic diversity). Focused decomposition further improves depth per aspect.
Iterative fix-then-re-review uncovers issues previously masked by other bugs.

Source: deksden (@deksden_notes) — "Plan Swarming" technique, April 2026.
Related: Anthropic Harness Design (Generator-Evaluator), deep-review (parallel competency code review).

Research backing:
- [2502.11027] Sampling diversity in LLM inference — diverse prompts beat identical: +10.8% reasoning, +9.5% code
- [2602.09341] AgentAuditor — reasoning tree audit beats majority voting, recovers 65-82% of minority-correct findings
- [2602.17875] MultiVer — 4 parallel agents hit 82.7% recall on vulnerability detection (beats fine-tuned models)
- [2510.00317] MAVUL — multi-agent vuln detection: +600% vs single-agent
- Anthropic Code Review (Mar 2026) — parallel agents raise substantive findings from 16% to 54%

## Modes

This skill works in two modes:

**Plan mode** (default): review design docs, specs, ADRs, RFCs before implementation.
**Code mode**: review code files for bugs and vulnerabilities. Activated when user
passes code files instead of a plan, or says "review code", "find vulnerabilities",
"security audit". In code mode, aspects shift from plan-oriented (contracts,
completeness) to code-oriented (injection, auth bypass, race conditions, memory).

---

## Step 0: Identify the target document

Ask the user which document to review if not obvious from context.

**Plan mode**: PLAN.md, ADR, spec, design doc, RFC, or any structured
document describing what will be built and how.

**Code mode**: source code files, a module, or a directory. Best for
security audits, bug hunts, or pre-release quality checks.

```
Read the target document(s) fully. Note:
- Total size (lines, sections/files)
- Key components/modules described or implemented
- Interfaces between components
- Data flows and mutations
- External dependencies and trust boundaries
```

If the target is <100 lines with 1-2 simple components, suggest a single-pass
review instead — swarming is overkill for small targets.

---

## Step 1: ROUND 1 — Broad Review (single agent)

**Purpose**: catch obvious issues before spending tokens on multisampling.

Launch ONE Agent with this prompt:

```
You are a senior architect reviewing a plan document before implementation.
Your goal: find issues that would cause bugs, rework, or confusion during
implementation.

## Plan to review
{paste or reference the plan document path}

Read the entire plan. Then check for:

1. CONTRACTS — are interfaces between components fully specified?
   Types, error codes, required vs optional fields, versioning.
2. DATA FLOW — is data transformation described end-to-end?
   What happens at each boundary? Backward compatibility?
3. NEGATIVE SCENARIOS — what happens when things fail?
   Timeouts, partial failures, invalid input, race conditions.
4. CONSISTENCY — do different sections contradict each other?
   Same entity described differently in two places?
5. COMPLETENESS — are there gaps? Steps that say "TBD" or "later"?
   Scenarios mentioned but not covered?
6. DEPENDENCIES — is implementation order clear?
   Are blocking dependencies identified? Circular deps?
7. AMBIGUITY — could two engineers read a section and implement
   it differently? Vague terms like "handle appropriately"?

## Output format
For EACH finding:

FINDING: {one-line description}
SECTION: {which section of the plan}
SEVERITY: HIGH | MEDIUM | LOW
EVIDENCE: {quote the problematic text, max 2 lines}
FIX: {concrete change to the plan text}

If the plan is clean — output: "NO_FINDINGS — plan review clean."
Do NOT pad with praise. Only problems.
```

### After Round 1

Collect findings. If **0 findings** → plan is clean, congratulate user, stop.

If findings exist:
1. Present findings to user grouped by severity
2. Ask: "Apply these fixes and continue to Round 2 (multisampling)?"
3. If user approves fixes → apply them to the plan document
4. If user says stop → stop

---

## Step 2: ROUND 2 — Diverse Multisampling (N parallel agents, varied perspectives)

**Purpose**: stochastic diversity catches what one pass missed.

**IMPORTANT**: do NOT use identical prompts for all agents. Research [2502.11027]
shows identical prompts produce correlated errors — agents "cluster" on the same
issues and miss the same blind spots. Instead, give each agent a DIFFERENT
perspective while reviewing the same document.

Launch **3 agents in parallel** (or 5 for critical plans), each with a
**different reviewer persona**:

**CRITICAL**: launch all agents in a SINGLE message (parallel tool calls).
Each agent has isolated context — no cross-contamination.

### Plan mode perspectives

| Agent | Persona | Focus bias |
|---|---|---|
| 1 | **Skeptical implementer** | "I have to code this tomorrow — what's unclear, contradictory, or impossible?" |
| 2 | **Security auditor** | "Where are the trust boundaries? What happens with malicious input?" |
| 3 | **QA engineer** | "How do I test this? What edge cases aren't covered? What breaks at scale?" |
| 4 | **New team member** | "I just joined — what terms are undefined? What implicit knowledge is required?" |
| 5 | **Ops/SRE** | "What fails at 3am? What's the rollback plan? What's unmonitored?" |

### Code mode perspectives

| Agent | Persona | Focus bias |
|---|---|---|
| 1 | **Attacker** | "How do I exploit this? Injection, auth bypass, privilege escalation?" |
| 2 | **Concurrency specialist** | "What races, deadlocks, or ordering issues exist?" |
| 3 | **Performance engineer** | "What's O(n^2)? What allocates unbounded memory? What blocks the event loop?" |
| 4 | **Error recovery auditor** | "What happens when X fails? Is cleanup correct? Are resources leaked?" |
| 5 | **In