Skip to main content
ClaudeWave
Skill389 estrellas del repoactualizado today

evolve

The evolve skill runs autonomous improvement loops that iteratively measure code quality against a rubric, identify the highest-priority issue, execute fixes via the rpi skill, and repeat until stopping conditions are met. Use it when you need continuous, hands-off code refinement that prioritizes work by impact, such as closing coverage gaps, reducing complexity, fixing failing tests, addressing security vulnerabilities, or resolving directive mismatches across a codebase.

Instalar en Claude Code
Copiar
git clone --depth 1 https://github.com/boshu2/agentops /tmp/evolve && cp -r /tmp/evolve/images/gemini/skills/evolve ~/.claude/skills/evolve
Después abre una sesión nueva de Claude Code; el skill carga automáticamente.

SKILL.md

# /evolve — Goal-Driven Compounding Loop

> **Cross-vendor analog:** Anthropic Managed Agents Outcomes (May 2026). Both close the loop "agent runs → grader scores against a rubric → agent retries"; AgentOps does it locally against any model.

> Measure what's wrong. Fix the worst thing. Measure again. Compound.

**The loop runs as this skill (skills-are-the-runtime).** `evolve` selects work
and invokes complete `/rpi --auto` cycles — that *is* the loop. `evolve` (and
`ao rpi loop --supervisor`) are terminal-native **wrapper commands** for humans or
non-skill runtimes, not the default expression of the loop; they reuse the same v2
RPI loop engine. (The substrate dispatches the whole `evolve` skill loop as one
unit; it never drives the loop's insides. The `evolve`/`ao rpi` CLI wrappers are
being retired — ag-iowf.)

**Operator cadence:** post-mortem finished work, analyze the current repo state,
select or create the next highest-value work item, let `rpi` handle research,
planning, pre-mortem, implementation, and validation, then harvest follow-ups
and repeat until a kill switch, max-cycle cap, regression breaker, or real
dormancy stops the run.

Always-on autonomous loop over `rpi`. Work selection order:
1. **Harvested `.agents/rpi/next-work.jsonl` work** (freshest concrete follow-up)
2. **Open ready beads work** (`bd ready`)
3. **Failing goals and directive gaps** (`ao goals measure`)
4. **Testing improvements** (missing/thin coverage, missing regression tests)
5. **Validation tightening and bug-hunt passes** (gates, audits, bug sweeps)
6. **Complexity / TODO / FIXME / drift / dead code / stale docs / stale research mining**
7. **Concrete feature suggestions** derived from repo purpose when no sharper work exists

**Work generators** that feed the selection ladder (auto-invoked, skip with `--no-lifecycle`):
- `Skill(skill="test", args="coverage")` → files with <40% coverage become queue items (Step 3.4)
- `Skill(skill="refactor", args="--sweep all --dry-run")` → functions with CC > 20 become queue items (Step 3.6)
- `Skill(skill="deps", args="audit")` → deps with CVSS >= 7.0 or 2+ major versions behind become queue items (Step 3.5)
- `Skill(skill="perf", args="profile --quick")` → perf findings become queue items when hot paths detected (Step 3.5)

**Dormancy is last resort.** Empty current queues mean "run the generator layers", not "stop". Only go dormant after the queue layers and generator layers come up empty across multiple consecutive passes.

**Live skill edit immune system:** if an evolve cycle edits
`skills/<slug>/SKILL.md`, run
`ao skills edit seal --skill <slug> --actor "${AGENT_NAME:-agent}"` before the
cycle hands off. The seal creates the rollback commit and records the
`Skill-Edit` trailers used by the daily digest. Critical skills listed in
`docs/contracts/critical-skills.txt` reject unattended edits; use
`--allow-critical` only when Bo is supervising that critical edit.

```bash
/evolve                      # Run until kill switch, max-cycles, or real dormancy
/evolve --max-cycles=5       # Cap at 5 cycles
/evolve --dry-run            # Show what would be worked on, don't execute
/evolve --beads-only         # Skip goals measurement, work beads backlog only
/evolve --quality            # Quality-first mode: prioritize post-mortem findings
/evolve --quality --max-cycles=10  # Quality mode with cycle cap
/evolve --compile            # Mine → Defrag warmup before first cycle
/evolve --compile --max-cycles=5 # Warm knowledge base then run 5 cycles
/evolve --test-first         # Default strict-quality /rpi execution path
/evolve --no-test-first      # Explicit opt-out from test-first mode
```

## Delineation vs Nightly Knowledge Compounding

| Lane | Runs | Mutates code? | Mutates corpus? | Outer loop? | Budget |
|------|------|---------------|-----------------|-------------|--------|
| `$curate --mode=dream` | nightly, private local | **No** | **Yes (heavy)** | **Yes (convergence)** | wall-clock + plateau |
| `evolve` | daytime, operator-driven | Yes (via `rpi`) | Yes (light) | Yes | cycle cap |

**The old dream skill is retired**; out-of-session compounding moved to Gas City and the current skill surface is `$curate --mode=dream`. `/evolve` owns the live daytime code-compounding lane. Both still share the fitness-measurement substrate via `corpus.Compute` / `ao goals measure`.

## Flags

| Flag | Default | Description |
|------|---------|-------------|
| `--max-cycles=N` | unlimited | Stop after `N` completed cycles |
| `--dry-run` | off | Show planned cycle actions without executing |
| `--beads-only` | off | Skip goal measurement and run backlog-only selection |
| `--skip-baseline` | off | Skip first-run baseline snapshot |
| `--quality` | off | Prioritize harvested post-mortem findings |
| `--compile` | off | Run `ao mine` + `ao defrag` warmup before cycle 1 |
| `--test-first` | on | Pass strict-quality defaults through to `rpi` |
| `--no-test-first` | off | Explicitly disable test-first passthrough to `rpi` |
| `--no-lifecycle` | off | Skip lifecycle work generators in Steps 3.4-3.6 (/test, /deps, /perf, /refactor). Falls back to manual scanning. |
| `--mode=burst\|loop` | burst | Operator-loop; STOP refused. [loop-mode.md](references/loop-mode.md). |

## Execution Steps

**YOU MUST EXECUTE THIS WORKFLOW. Do not just describe it.**

**FULLY AUTONOMOUS.** Read `references/autonomous-execution.md`. Every `rpi` uses `--auto`. Do NOT ask the user anything. Each cycle = complete 3-phase `rpi` run.

For broad AgentOps 3.0 domain evolution across skills, CLI, hooks, docs, tests,
beads, and knowledge, first read
[references/domain-evolution-bootstrap.md](references/domain-evolution-bootstrap.md).
It supplies the BDD/DDD/Hexagonal/TDD/XP control surface and the clean-room
skill-factory guardrails.

### Step 0: Setup

**Stale-checkout survey guard (run FIRST).** Before any tree-reading survey: `git fetch origin && git status -sb`. If the checkout is behind/diverged AND it is a t