Skip to main content
ClaudeWave
Skill341 repo starsupdated 2d ago

eval-analyze

The eval-analyze skill processes Mnemon harness evaluation reports by comparing observed Claude behavior against predefined rubrics, classifying outcomes as pass, weak, fail, or invalid, and identifying improvement targets across memory, skill, eval, host adapter, setup, docs, or scenario dimensions. Use this skill immediately after completing an eval run to determine whether a behavior change meets expectations and to guide subsequent refinement efforts through structured evidence extraction and root cause analysis.

Install in Claude Code
Copy
git clone --depth 1 https://github.com/mnemon-dev/mnemon /tmp/eval-analyze && cp -r /tmp/eval-analyze/harness/loops/eval/skills/eval-analyze ~/.claude/skills/eval-analyze
Then start a new Claude Code session; the skill loads automatically.

SKILL.md

# Eval Analyze

Use this skill after an eval run to judge behavior and extract improvement
evidence.

## Procedure

1. Read the report, relevant artifact summaries, and the selected rubric.
2. Compare observed behavior to the hypothesis.
3. Classify the outcome:
   - `pass`: behavior meets the rubric.
   - `weak`: partially useful but missing expected evidence or consistency.
   - `fail`: behavior contradicts the target expectation.
   - `invalid`: setup or scenario issue prevents judgement.
4. Identify the likely improvement target:
   - memory
   - skill
   - eval
   - host adapter
   - setup
   - docs
   - scenario or rubric
5. If a new eval asset is warranted, create a candidate summary instead of
   editing canonical assets immediately.

## Output

Write a concise analysis with:

- outcome
- evidence
- likely cause
- recommended next action
- candidate eval asset path, if any