Skill2.7k repo starsupdated today

skill-optimizer

Skill Optimizer trains existing SKILL.md files through iterative improvement cycles inspired by machine learning optimization, analyzing accumulated learn-rule corrections to propose bounded patches that are validated against past user feedback before acceptance. Use when a skill has gathered eight or more learning trajectories and needs consolidation or refinement rather than expansion, with available API budget and offline processing time.

View source Repository: pro-workflow

Install in Claude Code

Copy

git clone --depth 1 https://github.com/rohitg00/pro-workflow /tmp/skill-optimizer && cp -r /tmp/skill-optimizer/skills/skill-optimizer ~/.claude/skills/skill-optimizer

Then start a new Claude Code session; the skill loads automatically.

Definition

SKILL.md

# Skill Optimizer

Train an existing SKILL.md the way a deep-learning optimizer trains weights: via rollouts, gradient-like reflections, validation-gated acceptance. No model retraining; only the skill markdown changes.

## When to use

Use this skill when:
- A pro-workflow skill has accumulated 8+ learn-rule rows for it
- The user reports the skill is "getting bloated" or "rules keep being repeated"
- The user wants offline, budget-capped improvement over multiple sessions

Do not use when:
- Skill has fewer than 8 trajectories (nothing to learn from)
- The user wants real-time edits (this is offline, single-shot)
- No `ANTHROPIC_API_KEY` (or equivalent provider key) is available

## Architecture (mirrors SkillOpt's six-stage loop)

```text
rollout      pull recent learnings from SQLite (existing learn-rule rows)
reflect      optimizer LLM analyzes a minibatch, proposes add/delete/replace patches
aggregate    vote-merge patches across minibatches
select       clip by LR budget (default: 3 adds, 2 deletes, 3 replaces per step)
update       apply selected patches to a candidate skill content
evaluate     evaluator LLM scores candidate against held-out validation items
gate         accept candidate only if weighted score >= current + acceptThreshold
slow update  at epoch boundary, consolidate accepted edits into a coherent rewrite
```

Failed candidates are stored in a rejection buffer and fed back to the next reflect step so the optimizer doesn't propose the same patch twice.

## Run it

```bash
/skill-optimize <slug> [options]
```

Options (all optional; sensible defaults shown):

| Flag | Default | Notes |
|---|---|---|
| `--epochs N` | 3 | Outer loop count |
| `--batch-size N` | 8 | Trajectories per minibatch |
| `--minibatches N` | 2 | Minibatches per epoch |
| `--holdout N` | 6 | Validation items reserved (max ~25% of trajectories) |
| `--budget-usd X` | 0.50 | Hard cap; loop aborts when spent |
| `--optimizer-model M` | `claude-sonnet-4-6` | Reflect + slow-update model |
| `--evaluator-model M` | `claude-haiku-4-5-20251001` | Gate model (cheaper) |
| `--max-adds N` | 3 | LR budget per step |
| `--max-deletes N` | 2 | |
| `--max-replaces N` | 3 | |
| `--accept-threshold X` | 0.0 | Minimum score delta to accept candidate |
| `--max-skill-tokens N` | 2000 | Hard cap on candidate length |
| `--slow-every N` | 2 | Epochs between consolidation passes |
| `--json` | off | Machine-readable output |

Kill switch: `touch ~/.pro-workflow/STOP` aborts the loop between steps.

## Output

- Candidate accepted → SKILL.md overwritten, hash stamp appended in HTML comment
- Run details persist in `optimization_runs`, `optimization_candidates`, `optimization_patches`, `optimization_rejections`
- Validation set persists in `optimization_validation` (reusable across runs)

Inspect after:

```bash
sqlite3 ~/.pro-workflow/data.db "SELECT id, skill_slug, initial_score, best_score, accepted_steps, rejected_steps, spent_usd FROM optimization_runs ORDER BY id DESC LIMIT 5"
```

## Rules

- Validation set is frozen at run start. Never re-derive from new corrections mid-run.
- One candidate per step. No parallel branches.
- Slow-update output is itself a candidate; it must pass the gate to replace the best.
- The optimizer LLM and evaluator LLM may be different models. Mixing a strong optimizer with a cheap evaluator is the SkillOpt-recommended config.
- If `spent_usd >= budget_usd` at any step boundary, the loop ends with `stopped_reason="budget exhausted"`.
- Patches whose anchor is no longer present in the skill (because a prior patch in the same step removed it) are recorded as rejected with reason `anchor_missing`.

## Provenance

Inspired by Microsoft SkillOpt (arXiv:2605.23904). The six-stage rollout/reflect/aggregate/select/update/evaluate pipeline, LR budget, rejection buffer, and slow / meta update mechanics are adapted to pro-workflow's existing SQLite + learn-rule data plane. No SkillOpt code is reused. "ReflACT" is not a SkillOpt term and is not used here; the loop is referred to by stage names only.