Skill391 estrellas del repoactualizado yesterday

eval-plan

The eval-plan skill structures scenario-driven evaluations for Mnemon HostAgent systems by defining a target component, selecting or creating test scenarios, formulating observable hypotheses, specifying evidence collection methods, and establishing success rubrics. Use this skill before executing HostAgent evaluations to ensure systematic assessment of loop behavior, setup workflows, host projections, and documentation processes through documented targets, scenarios, suites, evidence types, and expected report outputs.

Ver fuente Repositorio: mnemon

Instalar en Claude Code

Copiar

git clone --depth 1 https://github.com/mnemon-dev/mnemon /tmp/eval-plan && cp -r /tmp/eval-plan/harness/loops/eval/skills/eval-plan ~/.claude/skills/eval-plan

Después abre una sesión nueva de Claude Code; el skill carga automáticamente.

Definición

SKILL.md

# Eval Plan

Use this skill to design a scenario-driven eval before running a HostAgent.

## Procedure

1. Identify the target: loop, setup behavior, host projection, docs workflow, or
   eval itself.
2. Choose an existing scenario and suite when one fits.
3. If no scenario fits, draft an ephemeral plan first. Do not promote it during
   the same step.
4. State the hypothesis in observable terms.
5. Select the HostAgent and loop combination. Codex app server is the default
   HostAgent for current Mnemon evals.
6. Define the evidence to collect:
   - transcript or response reference
   - git diff
   - `.mnemon` state changes
   - projected host surface
   - report path
   - logs or timeout reason
7. Attach a rubric or mark the run exploratory.

## Output

Return a short eval plan with:

- target
- scenario
- suite
- host
- loops
- hypothesis
- evidence
- expected report path

Del mismo repositorio

eval-analyzeSkill

Analyze Mnemon harness eval reports, classify outcomes, and extract improvement evidence.

eval-improveSkill

Turn stable Mnemon harness eval findings into scoped project, loop, adapter, docs, or eval asset improvements.

eval-runSkill

Execute or supervise a planned Mnemon harness eval run in an isolated HostAgent workspace.

mnemon-goalSkill

Manage project-scoped Mnemon goal state, evidence, verification, completion, blockers, and host goal links.

memory-getSkill

Read scoped memory from Local Mnemon when GUIDE.md indicates that prior memory may help the current task.

memory-setSkill

Submit durable memory candidates to Local Mnemon when GUIDE.md indicates that a stable fact, preference, decision, or continuity item should be kept.

skill-authorSkill

Draft or revise high-quality SKILL.md content for approved or proposed Mnemon skill changes.

skill-curateSkill

Start a low-frequency review of skill evidence and canonical skill lifecycle state.