Skill391 repo starsupdated today

eval-analyze

The eval-analyze skill processes Mnemon harness evaluation reports by comparing observed Claude behavior against predefined rubrics, classifying outcomes as pass, weak, fail, or invalid, and identifying improvement targets across memory, skill, eval, host adapter, setup, docs, or scenario dimensions. Use this skill immediately after completing an eval run to determine whether a behavior change meets expectations and to guide subsequent refinement efforts through structured evidence extraction and root cause analysis.

View source Repository: mnemon

Install in Claude Code

Copy

git clone --depth 1 https://github.com/mnemon-dev/mnemon /tmp/eval-analyze && cp -r /tmp/eval-analyze/harness/loops/eval/skills/eval-analyze ~/.claude/skills/eval-analyze

Then start a new Claude Code session; the skill loads automatically.

Definition

SKILL.md

# Eval Analyze

Use this skill after an eval run to judge behavior and extract improvement
evidence.

## Procedure

1. Read the report, relevant artifact summaries, and the selected rubric.
2. Compare observed behavior to the hypothesis.
3. Classify the outcome:
   - `pass`: behavior meets the rubric.
   - `weak`: partially useful but missing expected evidence or consistency.
   - `fail`: behavior contradicts the target expectation.
   - `invalid`: setup or scenario issue prevents judgement.
4. Identify the likely improvement target:
   - memory
   - skill
   - eval
   - host adapter
   - setup
   - docs
   - scenario or rubric
5. If a new eval asset is warranted, create a candidate summary instead of
   editing canonical assets immediately.

## Output

Write a concise analysis with:

- outcome
- evidence
- likely cause
- recommended next action
- candidate eval asset path, if any