eval-analyze
The eval-analyze skill processes Mnemon harness evaluation reports by comparing observed Claude behavior against predefined rubrics, classifying outcomes as pass, weak, fail, or invalid, and identifying improvement targets across memory, skill, eval, host adapter, setup, docs, or scenario dimensions. Use this skill immediately after completing an eval run to determine whether a behavior change meets expectations and to guide subsequent refinement efforts through structured evidence extraction and root cause analysis.
git clone --depth 1 https://github.com/mnemon-dev/mnemon /tmp/eval-analyze && cp -r /tmp/eval-analyze/harness/loops/eval/skills/eval-analyze ~/.claude/skills/eval-analyzeSKILL.md
# Eval Analyze Use this skill after an eval run to judge behavior and extract improvement evidence. ## Procedure 1. Read the report, relevant artifact summaries, and the selected rubric. 2. Compare observed behavior to the hypothesis. 3. Classify the outcome: - `pass`: behavior meets the rubric. - `weak`: partially useful but missing expected evidence or consistency. - `fail`: behavior contradicts the target expectation. - `invalid`: setup or scenario issue prevents judgement. 4. Identify the likely improvement target: - memory - skill - eval - host adapter - setup - docs - scenario or rubric 5. If a new eval asset is warranted, create a candidate summary instead of editing canonical assets immediately. ## Output Write a concise analysis with: - outcome - evidence - likely cause - recommended next action - candidate eval asset path, if any
Turn stable Mnemon harness eval findings into scoped project, loop, adapter, docs, or eval asset improvements.
Design a scenario-driven Mnemon harness eval with target, hypothesis, HostAgent, loop configuration, evidence, and rubric.
Execute or supervise a planned Mnemon harness eval run in an isolated HostAgent workspace.
Manage project-scoped Mnemon goal state, evidence, verification, completion, blockers, and host goal links.
Recall long-term memory from Mnemon when GUIDE.md indicates that prior memory may help the current task.
Maintain prompt-facing working memory by editing MEMORY.md when GUIDE.md indicates that durable information should be kept.
Draft or revise high-quality SKILL.md content for approved or proposed Mnemon skill changes.
Start a low-frequency review of skill evidence and canonical skill lifecycle state.