Skip to main content
ClaudeWave
Skill341 repo starsupdated 2d ago

eval-plan

The eval-plan skill structures scenario-driven evaluations for Mnemon HostAgent systems by defining a target component, selecting or creating test scenarios, formulating observable hypotheses, specifying evidence collection methods, and establishing success rubrics. Use this skill before executing HostAgent evaluations to ensure systematic assessment of loop behavior, setup workflows, host projections, and documentation processes through documented targets, scenarios, suites, evidence types, and expected report outputs.

Install in Claude Code
Copy
git clone --depth 1 https://github.com/mnemon-dev/mnemon /tmp/eval-plan && cp -r /tmp/eval-plan/harness/loops/eval/skills/eval-plan ~/.claude/skills/eval-plan
Then start a new Claude Code session; the skill loads automatically.

SKILL.md

# Eval Plan

Use this skill to design a scenario-driven eval before running a HostAgent.

## Procedure

1. Identify the target: loop, setup behavior, host projection, docs workflow, or
   eval itself.
2. Choose an existing scenario and suite when one fits.
3. If no scenario fits, draft an ephemeral plan first. Do not promote it during
   the same step.
4. State the hypothesis in observable terms.
5. Select the HostAgent and loop combination. Codex app server is the default
   HostAgent for current Mnemon evals.
6. Define the evidence to collect:
   - transcript or response reference
   - git diff
   - `.mnemon` state changes
   - projected host surface
   - report path
   - logs or timeout reason
7. Attach a rubric or mark the run exploratory.

## Output

Return a short eval plan with:

- target
- scenario
- suite
- host
- loops
- hypothesis
- evidence
- expected report path