eval-plan
The eval-plan skill structures scenario-driven evaluations for Mnemon HostAgent systems by defining a target component, selecting or creating test scenarios, formulating observable hypotheses, specifying evidence collection methods, and establishing success rubrics. Use this skill before executing HostAgent evaluations to ensure systematic assessment of loop behavior, setup workflows, host projections, and documentation processes through documented targets, scenarios, suites, evidence types, and expected report outputs.
git clone --depth 1 https://github.com/mnemon-dev/mnemon /tmp/eval-plan && cp -r /tmp/eval-plan/harness/loops/eval/skills/eval-plan ~/.claude/skills/eval-planSKILL.md
# Eval Plan Use this skill to design a scenario-driven eval before running a HostAgent. ## Procedure 1. Identify the target: loop, setup behavior, host projection, docs workflow, or eval itself. 2. Choose an existing scenario and suite when one fits. 3. If no scenario fits, draft an ephemeral plan first. Do not promote it during the same step. 4. State the hypothesis in observable terms. 5. Select the HostAgent and loop combination. Codex app server is the default HostAgent for current Mnemon evals. 6. Define the evidence to collect: - transcript or response reference - git diff - `.mnemon` state changes - projected host surface - report path - logs or timeout reason 7. Attach a rubric or mark the run exploratory. ## Output Return a short eval plan with: - target - scenario - suite - host - loops - hypothesis - evidence - expected report path
Analyze Mnemon harness eval reports, classify outcomes, and extract improvement evidence.
Turn stable Mnemon harness eval findings into scoped project, loop, adapter, docs, or eval asset improvements.
Execute or supervise a planned Mnemon harness eval run in an isolated HostAgent workspace.
Manage project-scoped Mnemon goal state, evidence, verification, completion, blockers, and host goal links.
Recall long-term memory from Mnemon when GUIDE.md indicates that prior memory may help the current task.
Maintain prompt-facing working memory by editing MEMORY.md when GUIDE.md indicates that durable information should be kept.
Draft or revise high-quality SKILL.md content for approved or proposed Mnemon skill changes.
Start a low-frequency review of skill evidence and canonical skill lifecycle state.