eval-improve
The eval-improve skill converts validated findings from Mnemon evaluation runs into concrete improvements to project components, evaluation assets, documentation, or loop policies. Use this skill when stable evaluation data supports a specific change and you need structured guidance on curating eval assets, isolating scope, and applying promotion criteria before making updates canonical.
git clone --depth 1 https://github.com/mnemon-dev/mnemon /tmp/eval-improve && cp -r /tmp/eval-improve/harness/loops/eval/skills/eval-improve ~/.claude/skills/eval-improveSKILL.md
# Eval Improve Use this skill to turn stable eval findings into project changes. ## Procedure 1. Confirm the finding is backed by a report or repeated observation. 2. Pick one improvement target. Avoid mixing loop policy changes, runner changes, docs changes, and scenario promotion in one patch unless they are tightly coupled. 3. For eval asset changes: - keep exploratory ideas in scratch - add candidate assets under runtime candidates - promote canonical repo assets only after curation 4. For code or harness changes, run the narrowest relevant eval or validation. 5. Summarize what changed, which evidence motivated it, and what remains unproven. ## Promotion Checklist Before making an eval asset canonical, verify: - It has a clear target and hypothesis. - It has an explicit rubric. - It produces reviewable artifacts. - It is not duplicative. - It is stable enough for its intended suite. - It does not reward weak or unsafe behavior.
Analyze Mnemon harness eval reports, classify outcomes, and extract improvement evidence.
Design a scenario-driven Mnemon harness eval with target, hypothesis, HostAgent, loop configuration, evidence, and rubric.
Execute or supervise a planned Mnemon harness eval run in an isolated HostAgent workspace.
Manage project-scoped Mnemon goal state, evidence, verification, completion, blockers, and host goal links.
Recall long-term memory from Mnemon when GUIDE.md indicates that prior memory may help the current task.
Maintain prompt-facing working memory by editing MEMORY.md when GUIDE.md indicates that durable information should be kept.
Draft or revise high-quality SKILL.md content for approved or proposed Mnemon skill changes.
Start a low-frequency review of skill evidence and canonical skill lifecycle state.