skill-optimizer
Skill Optimizer trains existing SKILL.md files through iterative improvement cycles inspired by machine learning optimization, analyzing accumulated learn-rule corrections to propose bounded patches that are validated against past user feedback before acceptance. Use when a skill has gathered eight or more learning trajectories and needs consolidation or refinement rather than expansion, with available API budget and offline processing time.
git clone --depth 1 https://github.com/rohitg00/pro-workflow /tmp/skill-optimizer && cp -r /tmp/skill-optimizer/skills/skill-optimizer ~/.claude/skills/skill-optimizerSKILL.md
# Skill Optimizer Train an existing SKILL.md the way a deep-learning optimizer trains weights: via rollouts, gradient-like reflections, validation-gated acceptance. No model retraining; only the skill markdown changes. ## When to use Use this skill when: - A pro-workflow skill has accumulated 8+ learn-rule rows for it - The user reports the skill is "getting bloated" or "rules keep being repeated" - The user wants offline, budget-capped improvement over multiple sessions Do not use when: - Skill has fewer than 8 trajectories (nothing to learn from) - The user wants real-time edits (this is offline, single-shot) - No `ANTHROPIC_API_KEY` (or equivalent provider key) is available ## Architecture (mirrors SkillOpt's six-stage loop) ```text rollout pull recent learnings from SQLite (existing learn-rule rows) reflect optimizer LLM analyzes a minibatch, proposes add/delete/replace patches aggregate vote-merge patches across minibatches select clip by LR budget (default: 3 adds, 2 deletes, 3 replaces per step) update apply selected patches to a candidate skill content evaluate evaluator LLM scores candidate against held-out validation items gate accept candidate only if weighted score >= current + acceptThreshold slow update at epoch boundary, consolidate accepted edits into a coherent rewrite ``` Failed candidates are stored in a rejection buffer and fed back to the next reflect step so the optimizer doesn't propose the same patch twice. ## Run it ```bash /skill-optimize <slug> [options] ``` Options (all optional; sensible defaults shown): | Flag | Default | Notes | |---|---|---| | `--epochs N` | 3 | Outer loop count | | `--batch-size N` | 8 | Trajectories per minibatch | | `--minibatches N` | 2 | Minibatches per epoch | | `--holdout N` | 6 | Validation items reserved (max ~25% of trajectories) | | `--budget-usd X` | 0.50 | Hard cap; loop aborts when spent | | `--optimizer-model M` | `claude-sonnet-4-6` | Reflect + slow-update model | | `--evaluator-model M` | `claude-haiku-4-5-20251001` | Gate model (cheaper) | | `--max-adds N` | 3 | LR budget per step | | `--max-deletes N` | 2 | | | `--max-replaces N` | 3 | | | `--accept-threshold X` | 0.0 | Minimum score delta to accept candidate | | `--max-skill-tokens N` | 2000 | Hard cap on candidate length | | `--slow-every N` | 2 | Epochs between consolidation passes | | `--json` | off | Machine-readable output | Kill switch: `touch ~/.pro-workflow/STOP` aborts the loop between steps. ## Output - Candidate accepted → SKILL.md overwritten, hash stamp appended in HTML comment - Run details persist in `optimization_runs`, `optimization_candidates`, `optimization_patches`, `optimization_rejections` - Validation set persists in `optimization_validation` (reusable across runs) Inspect after: ```bash sqlite3 ~/.pro-workflow/data.db "SELECT id, skill_slug, initial_score, best_score, accepted_steps, rejected_steps, spent_usd FROM optimization_runs ORDER BY id DESC LIMIT 5" ``` ## Rules - Validation set is frozen at run start. Never re-derive from new corrections mid-run. - One candidate per step. No parallel branches. - Slow-update output is itself a candidate; it must pass the gate to replace the best. - The optimizer LLM and evaluator LLM may be different models. Mixing a strong optimizer with a cheap evaluator is the SkillOpt-recommended config. - If `spent_usd >= budget_usd` at any step boundary, the loop ends with `stopped_reason="budget exhausted"`. - Patches whose anchor is no longer present in the skill (because a prior patch in the same step removed it) are recorded as rejected with reason `anchor_missing`. ## Provenance Inspired by Microsoft SkillOpt (arXiv:2605.23904). The six-stage rollout/reflect/aggregate/select/update/evaluate pipeline, LR budget, rejection buffer, and slow / meta update mechanics are adapted to pro-workflow's existing SQLite + learn-rule data plane. No SkillOpt code is reused. "ReflACT" is not a SkillOpt term and is not used here; the loop is referred to by stage names only.
Analyzes and optimizes context window usage across sessions. Use when context feels bloated, sessions run slow, or approaching compaction limits.
Analyze session token usage and cost patterns. Identify expensive operations and recommend optimizations. Use to understand and reduce session costs.
Specialized debugging agent. Use when facing hard bugs, test failures, or runtime errors that need systematic investigation.
Multi-phase development agent. Research > Plan > Implement with validation gates. Use PROACTIVELY when building features that touch >5 files or require architecture decisions.
Analyze permission denial patterns and generate optimized alwaysAllow/alwaysDeny rules. Use when permission prompts slow down workflow.
Break down complex tasks into implementation plans before writing code. Use when task touches >5 files, requires architecture decisions, or has unclear requirements.
Code review specialist that verifies every finding against actual code before reporting. Use before committing, for PR reviews, or after major changes.
Confidence-gated exploration that assesses readiness before implementation. Scores 0-100 across five dimensions and gives GO/HOLD verdict.