Skill391 estrellas del repoactualizado yesterday

eval-analyze

The eval-analyze skill processes Mnemon harness evaluation reports by comparing observed Claude behavior against predefined rubrics, classifying outcomes as pass, weak, fail, or invalid, and identifying improvement targets across memory, skill, eval, host adapter, setup, docs, or scenario dimensions. Use this skill immediately after completing an eval run to determine whether a behavior change meets expectations and to guide subsequent refinement efforts through structured evidence extraction and root cause analysis.

Ver fuente Repositorio: mnemon

Instalar en Claude Code

Copiar

git clone --depth 1 https://github.com/mnemon-dev/mnemon /tmp/eval-analyze && cp -r /tmp/eval-analyze/harness/loops/eval/skills/eval-analyze ~/.claude/skills/eval-analyze

Después abre una sesión nueva de Claude Code; el skill carga automáticamente.

Definición

SKILL.md

# Eval Analyze

Use this skill after an eval run to judge behavior and extract improvement
evidence.

## Procedure

1. Read the report, relevant artifact summaries, and the selected rubric.
2. Compare observed behavior to the hypothesis.
3. Classify the outcome:
   - `pass`: behavior meets the rubric.
   - `weak`: partially useful but missing expected evidence or consistency.
   - `fail`: behavior contradicts the target expectation.
   - `invalid`: setup or scenario issue prevents judgement.
4. Identify the likely improvement target:
   - memory
   - skill
   - eval
   - host adapter
   - setup
   - docs
   - scenario or rubric
5. If a new eval asset is warranted, create a candidate summary instead of
   editing canonical assets immediately.

## Output

Write a concise analysis with:

- outcome
- evidence
- likely cause
- recommended next action
- candidate eval asset path, if any

Del mismo repositorio

eval-improveSkill

Turn stable Mnemon harness eval findings into scoped project, loop, adapter, docs, or eval asset improvements.

eval-planSkill

Design a scenario-driven Mnemon harness eval with target, hypothesis, HostAgent, loop configuration, evidence, and rubric.

eval-runSkill

Execute or supervise a planned Mnemon harness eval run in an isolated HostAgent workspace.

mnemon-goalSkill

Manage project-scoped Mnemon goal state, evidence, verification, completion, blockers, and host goal links.

memory-getSkill

Read scoped memory from Local Mnemon when GUIDE.md indicates that prior memory may help the current task.

memory-setSkill

Submit durable memory candidates to Local Mnemon when GUIDE.md indicates that a stable fact, preference, decision, or continuity item should be kept.

skill-authorSkill

Draft or revise high-quality SKILL.md content for approved or proposed Mnemon skill changes.

skill-curateSkill

Start a low-frequency review of skill evidence and canonical skill lifecycle state.