Skip to main content
ClaudeWave
Skill341 estrellas del repoactualizado 2d ago

eval-analyze

The eval-analyze skill processes Mnemon harness evaluation reports by comparing observed Claude behavior against predefined rubrics, classifying outcomes as pass, weak, fail, or invalid, and identifying improvement targets across memory, skill, eval, host adapter, setup, docs, or scenario dimensions. Use this skill immediately after completing an eval run to determine whether a behavior change meets expectations and to guide subsequent refinement efforts through structured evidence extraction and root cause analysis.

Instalar en Claude Code
Copiar
git clone --depth 1 https://github.com/mnemon-dev/mnemon /tmp/eval-analyze && cp -r /tmp/eval-analyze/harness/loops/eval/skills/eval-analyze ~/.claude/skills/eval-analyze
Después abre una sesión nueva de Claude Code; el skill carga automáticamente.

SKILL.md

# Eval Analyze

Use this skill after an eval run to judge behavior and extract improvement
evidence.

## Procedure

1. Read the report, relevant artifact summaries, and the selected rubric.
2. Compare observed behavior to the hypothesis.
3. Classify the outcome:
   - `pass`: behavior meets the rubric.
   - `weak`: partially useful but missing expected evidence or consistency.
   - `fail`: behavior contradicts the target expectation.
   - `invalid`: setup or scenario issue prevents judgement.
4. Identify the likely improvement target:
   - memory
   - skill
   - eval
   - host adapter
   - setup
   - docs
   - scenario or rubric
5. If a new eval asset is warranted, create a candidate summary instead of
   editing canonical assets immediately.

## Output

Write a concise analysis with:

- outcome
- evidence
- likely cause
- recommended next action
- candidate eval asset path, if any