Skip to main content
ClaudeWave
Skill2.3k repo starsupdated 4d ago

skill-optimizer

Skill Optimizer trains existing SKILL.md files through iterative improvement cycles inspired by machine learning optimization, analyzing accumulated learn-rule corrections to propose bounded patches that are validated against past user feedback before acceptance. Use when a skill has gathered eight or more learning trajectories and needs consolidation or refinement rather than expansion, with available API budget and offline processing time.

Install in Claude Code
Copy
git clone --depth 1 https://github.com/rohitg00/pro-workflow /tmp/skill-optimizer && cp -r /tmp/skill-optimizer/skills/skill-optimizer ~/.claude/skills/skill-optimizer
Then start a new Claude Code session; the skill loads automatically.

SKILL.md

# Skill Optimizer

Train an existing SKILL.md the way a deep-learning optimizer trains weights: via rollouts, gradient-like reflections, validation-gated acceptance. No model retraining; only the skill markdown changes.

## When to use

Use this skill when:
- A pro-workflow skill has accumulated 8+ learn-rule rows for it
- The user reports the skill is "getting bloated" or "rules keep being repeated"
- The user wants offline, budget-capped improvement over multiple sessions

Do not use when:
- Skill has fewer than 8 trajectories (nothing to learn from)
- The user wants real-time edits (this is offline, single-shot)
- No `ANTHROPIC_API_KEY` (or equivalent provider key) is available

## Architecture (mirrors SkillOpt's six-stage loop)

```text
rollout      pull recent learnings from SQLite (existing learn-rule rows)
reflect      optimizer LLM analyzes a minibatch, proposes add/delete/replace patches
aggregate    vote-merge patches across minibatches
select       clip by LR budget (default: 3 adds, 2 deletes, 3 replaces per step)
update       apply selected patches to a candidate skill content
evaluate     evaluator LLM scores candidate against held-out validation items
gate         accept candidate only if weighted score >= current + acceptThreshold
slow update  at epoch boundary, consolidate accepted edits into a coherent rewrite
```

Failed candidates are stored in a rejection buffer and fed back to the next reflect step so the optimizer doesn't propose the same patch twice.

## Run it

```bash
/skill-optimize <slug> [options]
```

Options (all optional; sensible defaults shown):

| Flag | Default | Notes |
|---|---|---|
| `--epochs N` | 3 | Outer loop count |
| `--batch-size N` | 8 | Trajectories per minibatch |
| `--minibatches N` | 2 | Minibatches per epoch |
| `--holdout N` | 6 | Validation items reserved (max ~25% of trajectories) |
| `--budget-usd X` | 0.50 | Hard cap; loop aborts when spent |
| `--optimizer-model M` | `claude-sonnet-4-6` | Reflect + slow-update model |
| `--evaluator-model M` | `claude-haiku-4-5-20251001` | Gate model (cheaper) |
| `--max-adds N` | 3 | LR budget per step |
| `--max-deletes N` | 2 | |
| `--max-replaces N` | 3 | |
| `--accept-threshold X` | 0.0 | Minimum score delta to accept candidate |
| `--max-skill-tokens N` | 2000 | Hard cap on candidate length |
| `--slow-every N` | 2 | Epochs between consolidation passes |
| `--json` | off | Machine-readable output |

Kill switch: `touch ~/.pro-workflow/STOP` aborts the loop between steps.

## Output

- Candidate accepted → SKILL.md overwritten, hash stamp appended in HTML comment
- Run details persist in `optimization_runs`, `optimization_candidates`, `optimization_patches`, `optimization_rejections`
- Validation set persists in `optimization_validation` (reusable across runs)

Inspect after:

```bash
sqlite3 ~/.pro-workflow/data.db "SELECT id, skill_slug, initial_score, best_score, accepted_steps, rejected_steps, spent_usd FROM optimization_runs ORDER BY id DESC LIMIT 5"
```

## Rules

- Validation set is frozen at run start. Never re-derive from new corrections mid-run.
- One candidate per step. No parallel branches.
- Slow-update output is itself a candidate; it must pass the gate to replace the best.
- The optimizer LLM and evaluator LLM may be different models. Mixing a strong optimizer with a cheap evaluator is the SkillOpt-recommended config.
- If `spent_usd >= budget_usd` at any step boundary, the loop ends with `stopped_reason="budget exhausted"`.
- Patches whose anchor is no longer present in the skill (because a prior patch in the same step removed it) are recorded as rejected with reason `anchor_missing`.

## Provenance

Inspired by Microsoft SkillOpt (arXiv:2605.23904). The six-stage rollout/reflect/aggregate/select/update/evaluate pipeline, LR budget, rejection buffer, and slow / meta update mechanics are adapted to pro-workflow's existing SQLite + learn-rule data plane. No SkillOpt code is reused. "ReflACT" is not a SkillOpt term and is not used here; the loop is referred to by stage names only.