Skill126 estrellas del repoactualizado 3d ago
plan-swarm-review
|
Instalar en Claude Code
Copiargit clone --depth 1 https://github.com/AnastasiyaW/claude-code-config /tmp/plan-swarm-review && cp -r /tmp/plan-swarm-review/skills/architecture/plan-swarm-review ~/.claude/skills/plan-swarm-reviewDespués abre una sesión nueva de Claude Code; el skill carga automáticamente.
Definición
SKILL.md
# Plan Swarm Review
Iterative plan hardening through multisampling and focused decomposition.
**Core insight**: a single agent misses issues due to attention budget limits.
Multiple independent agents reading the same document find different problems
(stochastic diversity). Focused decomposition further improves depth per aspect.
Iterative fix-then-re-review uncovers issues previously masked by other bugs.
Source: deksden (@deksden_notes) — "Plan Swarming" technique, April 2026.
Related: Anthropic Harness Design (Generator-Evaluator), deep-review (parallel competency code review).
Research backing:
- [2502.11027] Sampling diversity in LLM inference — diverse prompts beat identical: +10.8% reasoning, +9.5% code
- [2602.09341] AgentAuditor — reasoning tree audit beats majority voting, recovers 65-82% of minority-correct findings
- [2602.17875] MultiVer — 4 parallel agents hit 82.7% recall on vulnerability detection (beats fine-tuned models)
- [2510.00317] MAVUL — multi-agent vuln detection: +600% vs single-agent
- Anthropic Code Review (Mar 2026) — parallel agents raise substantive findings from 16% to 54%
## Modes
This skill works in two modes:
**Plan mode** (default): review design docs, specs, ADRs, RFCs before implementation.
**Code mode**: review code files for bugs and vulnerabilities. Activated when user
passes code files instead of a plan, or says "review code", "find vulnerabilities",
"security audit". In code mode, aspects shift from plan-oriented (contracts,
completeness) to code-oriented (injection, auth bypass, race conditions, memory).
---
## Step 0: Identify the target document
Ask the user which document to review if not obvious from context.
**Plan mode**: PLAN.md, ADR, spec, design doc, RFC, or any structured
document describing what will be built and how.
**Code mode**: source code files, a module, or a directory. Best for
security audits, bug hunts, or pre-release quality checks.
```
Read the target document(s) fully. Note:
- Total size (lines, sections/files)
- Key components/modules described or implemented
- Interfaces between components
- Data flows and mutations
- External dependencies and trust boundaries
```
If the target is <100 lines with 1-2 simple components, suggest a single-pass
review instead — swarming is overkill for small targets.
---
## Step 1: ROUND 1 — Broad Review (single agent)
**Purpose**: catch obvious issues before spending tokens on multisampling.
Launch ONE Agent with this prompt:
```
You are a senior architect reviewing a plan document before implementation.
Your goal: find issues that would cause bugs, rework, or confusion during
implementation.
## Plan to review
{paste or reference the plan document path}
Read the entire plan. Then check for:
1. CONTRACTS — are interfaces between components fully specified?
Types, error codes, required vs optional fields, versioning.
2. DATA FLOW — is data transformation described end-to-end?
What happens at each boundary? Backward compatibility?
3. NEGATIVE SCENARIOS — what happens when things fail?
Timeouts, partial failures, invalid input, race conditions.
4. CONSISTENCY — do different sections contradict each other?
Same entity described differently in two places?
5. COMPLETENESS — are there gaps? Steps that say "TBD" or "later"?
Scenarios mentioned but not covered?
6. DEPENDENCIES — is implementation order clear?
Are blocking dependencies identified? Circular deps?
7. AMBIGUITY — could two engineers read a section and implement
it differently? Vague terms like "handle appropriately"?
## Output format
For EACH finding:
FINDING: {one-line description}
SECTION: {which section of the plan}
SEVERITY: HIGH | MEDIUM | LOW
EVIDENCE: {quote the problematic text, max 2 lines}
FIX: {concrete change to the plan text}
If the plan is clean — output: "NO_FINDINGS — plan review clean."
Do NOT pad with praise. Only problems.
```
### After Round 1
Collect findings. If **0 findings** → plan is clean, congratulate user, stop.
If findings exist:
1. Present findings to user grouped by severity
2. Ask: "Apply these fixes and continue to Round 2 (multisampling)?"
3. If user approves fixes → apply them to the plan document
4. If user says stop → stop
---
## Step 2: ROUND 2 — Diverse Multisampling (N parallel agents, varied perspectives)
**Purpose**: stochastic diversity catches what one pass missed.
**IMPORTANT**: do NOT use identical prompts for all agents. Research [2502.11027]
shows identical prompts produce correlated errors — agents "cluster" on the same
issues and miss the same blind spots. Instead, give each agent a DIFFERENT
perspective while reviewing the same document.
Launch **3 agents in parallel** (or 5 for critical plans), each with a
**different reviewer persona**:
**CRITICAL**: launch all agents in a SINGLE message (parallel tool calls).
Each agent has isolated context — no cross-contamination.
### Plan mode perspectives
| Agent | Persona | Focus bias |
|---|---|---|
| 1 | **Skeptical implementer** | "I have to code this tomorrow — what's unclear, contradictory, or impossible?" |
| 2 | **Security auditor** | "Where are the trust boundaries? What happens with malicious input?" |
| 3 | **QA engineer** | "How do I test this? What edge cases aren't covered? What breaks at scale?" |
| 4 | **New team member** | "I just joined — what terms are undefined? What implicit knowledge is required?" |
| 5 | **Ops/SRE** | "What fails at 3am? What's the rollback plan? What's unmonitored?" |
### Code mode perspectives
| Agent | Persona | Focus bias |
|---|---|---|
| 1 | **Attacker** | "How do I exploit this? Injection, auth bypass, privilege escalation?" |
| 2 | **Concurrency specialist** | "What races, deadlocks, or ordering issues exist?" |
| 3 | **Performance engineer** | "What's O(n^2)? What allocates unbounded memory? What blocks the event loop?" |
| 4 | **Error recovery auditor** | "What happens when X fails? Is cleanup correct? Are resources leaked?" |
| 5 | **In