Skill391 estrellas del repoactualizado yesterday

eval-run

The eval-run skill executes or supervises Mnemon harness evaluation runs within isolated HostAgent workspaces. Use it to run planned evaluation scenarios and suites, install required loop templates, collect artifacts and logs, and record failures as evidence rather than silent skips, while maintaining boundaries around canonical scenario modifications and artifact preservation.

Ver fuente Repositorio: mnemon

Instalar en Claude Code

Copiar

git clone --depth 1 https://github.com/mnemon-dev/mnemon /tmp/eval-run && cp -r /tmp/eval-run/harness/loops/eval/skills/eval-run ~/.claude/skills/eval-run

Después abre una sesión nueva de Claude Code; el skill carga automáticamente.

Definición

SKILL.md

# Eval Run

Use this skill to execute or supervise a planned eval run.

## Procedure

1. Confirm the plan names a host, suite or scenario, and evidence targets.
2. Create or use an isolated workspace. Do not run scenario state in the
   developer's active workspace unless the eval explicitly requires it.
3. Install the requested loop templates with `harness/ops`.
4. For Codex app-server evals, use the project runner when available:

   ```bash
   python3 scripts/codex_app_server_eval.py --suite
   ```

   Use a specific suite option when the scenario requires it.
5. Collect artifacts and logs before cleanup.
6. Record timeouts, setup failures, and HostAgent readiness failures as eval
   evidence, not as silent skips.

## Boundaries

- Do not change canonical scenarios, suites, or rubrics while running an eval.
- Do not delete artifacts needed for report review.
- Do not treat an exploratory run as a regression result.

Del mismo repositorio

eval-analyzeSkill

Analyze Mnemon harness eval reports, classify outcomes, and extract improvement evidence.

eval-improveSkill

Turn stable Mnemon harness eval findings into scoped project, loop, adapter, docs, or eval asset improvements.

eval-planSkill

Design a scenario-driven Mnemon harness eval with target, hypothesis, HostAgent, loop configuration, evidence, and rubric.

mnemon-goalSkill

Manage project-scoped Mnemon goal state, evidence, verification, completion, blockers, and host goal links.

memory-getSkill

Read scoped memory from Local Mnemon when GUIDE.md indicates that prior memory may help the current task.

memory-setSkill

Submit durable memory candidates to Local Mnemon when GUIDE.md indicates that a stable fact, preference, decision, or continuity item should be kept.

skill-authorSkill

Draft or revise high-quality SKILL.md content for approved or proposed Mnemon skill changes.

skill-curateSkill

Start a low-frequency review of skill evidence and canonical skill lifecycle state.