Skip to main content
ClaudeWave
Skill341 estrellas del repoactualizado 2d ago

eval-plan

The eval-plan skill structures scenario-driven evaluations for Mnemon HostAgent systems by defining a target component, selecting or creating test scenarios, formulating observable hypotheses, specifying evidence collection methods, and establishing success rubrics. Use this skill before executing HostAgent evaluations to ensure systematic assessment of loop behavior, setup workflows, host projections, and documentation processes through documented targets, scenarios, suites, evidence types, and expected report outputs.

Instalar en Claude Code
Copiar
git clone --depth 1 https://github.com/mnemon-dev/mnemon /tmp/eval-plan && cp -r /tmp/eval-plan/harness/loops/eval/skills/eval-plan ~/.claude/skills/eval-plan
Después abre una sesión nueva de Claude Code; el skill carga automáticamente.

SKILL.md

# Eval Plan

Use this skill to design a scenario-driven eval before running a HostAgent.

## Procedure

1. Identify the target: loop, setup behavior, host projection, docs workflow, or
   eval itself.
2. Choose an existing scenario and suite when one fits.
3. If no scenario fits, draft an ephemeral plan first. Do not promote it during
   the same step.
4. State the hypothesis in observable terms.
5. Select the HostAgent and loop combination. Codex app server is the default
   HostAgent for current Mnemon evals.
6. Define the evidence to collect:
   - transcript or response reference
   - git diff
   - `.mnemon` state changes
   - projected host surface
   - report path
   - logs or timeout reason
7. Attach a rubric or mark the run exploratory.

## Output

Return a short eval plan with:

- target
- scenario
- suite
- host
- loops
- hypothesis
- evidence
- expected report path