Skill606 estrellas del repoactualizado today
evolve
The evolve skill runs iterative improvement cycles on a target rubric, building a belief model across multiple experiments to identify cross-skill patterns and transferable insights. Use it when you need sustained autonomous quality advancement that discovers underlying mechanisms and compounds learning across cycles, rather than single-pass scoring or targeted axis attacks like the improve skill provides.
Instalar en Claude Code
Copiargit clone --depth 1 https://github.com/SethGammon/Citadel /tmp/evolve && cp -r /tmp/evolve/skills/evolve ~/.claude/skills/evolveDespués abre una sesión nueva de Claude Code; el skill carga automáticamente.
Definición
SKILL.md
# /evolve — Improvement Director
## Orientation
**Use when:** You want sustained autonomous quality advancement — the director
forms hypotheses, scouts before attacking, and builds a belief model that
compounds across cycles. Runs until a natural ceiling, budget exhaustion, or you
say stop.
**Don't use when:** You want a single scored loop (`/improve`), a known axis
attacked directly (`/improve --axis`), or a one-time audit (`/improve --score-only`).
**Key difference from `/improve`:** `/improve` follows the rubric mechanically.
`/evolve` asks *why* scores are where they are, validates those theories before
spending fleet budget, and extracts cross-skill patterns that propagate to skills
never directly attacked.
## Invocation
```
/evolve {target} # run until ceiling, velocity drop, or budget
/evolve {target} --n={N} # exactly N director cycles then stop
/evolve {target} --budget=${X} # run until cumulative spend reaches $X
/evolve {target} --continue # resume from saved director state
/evolve {target} --status # show belief model, velocity, spend — no attack
/evolve {target} --axis={name} # focus director on one axis (scout + attack only)
```
`target` maps to `.planning/rubrics/{target}.md`.
If no rubric exists, run `/improve {target}` Phase 0 first — `/evolve` requires
an approved rubric and will not auto-generate one.
## Campaign Artifacts
All findings are externalized incrementally — written after every phase, not
only at cycle end. A crashed or compacted session resumes with full context.
| Artifact | Path | Contents |
|---|---|---|
| Director state | `.planning/evolve/{target}/director-state.json` | cycle count, spend, velocity history, current phase, halt status |
| Belief model | `.planning/evolve/{target}/belief-model.jsonl` | one record per (axis, skill) per cycle: score, hypothesis, evidence, confidence |
| Experiment log | `.planning/evolve/{target}/experiment-log.jsonl` | every experiment: hypothesis → prediction → actual delta → mechanism confirmed |
| Pattern library | `.planning/evolve/{target}/pattern-library.md` | transferable patterns: what change to what axis class caused what delta in which skills |
| Cycle digest | `.planning/evolve/{target}/cycle-{n}-digest.md` | human-readable per-cycle summary for review |
| Global patterns | `.planning/research/patterns.md` | cross-target patterns written outside campaign scope; available to future sessions and other targets |
| Knowledge wiki | `.planning/wiki/` | compiled wiki pages from `/learn`; integrates evolve discoveries across sessions |
Create `.planning/evolve/{target}/` on first invocation. Create `.planning/research/` if absent.
**Cycle digest contents:** scores table (axis, prior, this cycle, delta), hypotheses
table (id, axis, hypothesis, scout result, confidence), what was attacked (axis,
skill, delta, mechanism confirmed), patterns discovered this cycle, belief model
updates, and the spend/velocity line. Full template: docs/QUALITY_LOOPS.md#cycle-digest-format.
## Director Cycle Protocol
### Phase 1: Survey
Run `/improve {target} --score-only`. Record scores to belief model with delta
from prior cycle (empty on cycle 1). Flag any axis that dropped since last cycle
as `regression-watch` — these are checked first in Phase 2.
### Phase 2: Hypothesize
For every axis below 8.0, generate one primary hypothesis in this form:
```
HYPOTHESIS: {axis} scores {n}/10 because {specific mechanism},
not because {common misread}.
PREDICTION: Fixing {mechanism} will raise score ≥ {delta} across {N} skills.
FALSIFICATION: If we apply {change} and score does not rise > 0.5, hypothesis rejected.
```
Draw hypotheses from: evaluator justifications in Phase 1, prior evidence in
the belief model, and programmatic check failures. Do not hypothesize from score
alone — the number is the symptom.
Write each hypothesis to the experiment log as `{ id, status: "pending", ... }`.
Skip hypothesis generation for an axis if the belief model already has a
`confidence >= 0.8` confirmed hypothesis for it that has not yet been attacked.
### Phase 3: Scout
For axes below 7.0, or axes with unconfirmed hypotheses: dispatch one scout
agent per hypothesis. Scouts read — they do not modify files.
Each scout returns `{ "hypothesis_id", "confirmed", "evidence", "confidence" }`
(schema example: docs/QUALITY_LOOPS.md#scout-result-schema).
**Scout confidence protocol**: Scouts read relevant files only — no edits, no test runs. Assign `confidence`:
- **0.9+**: mechanism is directly observable (explicit absence, missing section, wrong value in file)
- **0.7–0.89**: strong indirect evidence from 2+ corroborating observations
- **0.4–0.69**: single observation that supports the hypothesis; alternative explanations plausible
- **< 0.4**: no direct evidence found; hypothesis is speculative from this file set
Run scouts in parallel. Update experiment log:
- `confidence >= 0.7` → `confirmed`
- `confidence 0.4–0.69` → `needs-evidence` (do not attack; add to next cycle)
- `confidence < 0.4` → `rejected`
Skip Phase 3 for any hypothesis already `confirmed` at `confidence >= 0.8` in
the belief model from a prior cycle.
### Phase 4: Prioritize
For each confirmed hypothesis compute:
```
EV = (delta_estimate × axis_weight × confidence) / (effort_tier × collision_multiplier)
```
- `effort_tier`: low=1.0, medium=1.5, high=2.5
- `collision_multiplier`: 2.0 if axis shares primary files with another attack in this cycle
Select top K axes where K = min(confirmed count, 4). Document selection rationale
in cycle digest. If `--axis` was set, skip ranking — attack only that axis.
### Phase 5: Fleet Attack
Dispatch one agent per selected axis in an isolated worktree
(Agent tool, `isolation: "worktree"`). Each agent receives:
- The confirmed hypothesis and its falsification criterion
- The specific files to modify
- Verification oracle: `node scripts/run-with-timeout.js 300 node scripts/test-all.js`