Skill1.4k repo starsupdated 10d ago

discover

Discover initializes a new evo optimization workspace by scanning the target codebase to identify optimization opportunities, proposing unexplored performance dimensions, setting up a baseline benchmark in an isolated worktree, and executing the first experiment. Use this skill when a user requests /evo:discover, wants to instrument a repository for autonomous optimization, or asks to begin a new evo run on a project without an existing optimization setup.

View source Repository: evo

Install in Claude Code

Copy

git clone --depth 1 https://github.com/evo-hq/evo /tmp/discover && cp -r /tmp/discover/plugins/evo/skills/discover ~/.claude/skills/discover

Then start a new Claude Code session; the skill loads automatically.

Definition

SKILL.md

# Discover

Internal procedure for `evo:discover`. The user only sees the user-facing prompts, the dashboard URL, and the baseline score -- everything else is the agent's choreography.

## Evo surface

General guidance on the skills and tools available in evo. Each line is a triggering condition: if you're about to do X, pull/dispatch/read this. Don't preload -- act when the trigger fires.

**Always have a sense of the skill before jumping into its references.** A skill body carries the decision-making; references are concrete contracts that assume a decision has been made.

```
evo plugin
│
├── Main thread  (the orchestrator -- you, inside /evo:discover or /evo:optimize)
│   │
│   ├── Skills (Skill tool)
│   │   ├── evo:discover       starting a new evo workspace / instrumenting a project
│   │   ├── evo:optimize       after discover commits the baseline -- drives the loop.
│   │   │                      Args: subagents=N (read sizing-the-round FIRST),
│   │   │                            autonomous, subagents-only, budget=N, stall=N
│   │   ├── evo:ship           after the loop stops -- distills the best valid
│   │   │                      experiment into a mergeable change (PR if remote,
│   │   │                      else merge) + a mergeability report
│   │   ├── evo:finetuning     task is finetuning / post-training / training a model
│   │   └── evo:infra-setup    need a remote backend, pooled workspaces, lease/slot
│   │                          management, or specific provider auth/setup
│   │
│   └── Subagents to dispatch (Task tool, subagent_type=...)
│       ├── evo:benchmark-reviewer  before the baseline run, or whenever the
│       │                           benchmark command / harness changes
│       └── evo:ideator             stalled, or every ~5 committed experiments.
│                                   One subagent per brief:
│                                   failure_analysis, literature, frontier_extrapolation
│
├── Subagent thread  (each subagent spawned by /optimize step 5)
│   │
│   ├── Skills  (the subagent loads this on first turn -- the brief's first
│   │            sentence mandates it; not auto-loaded by the host)
│   │   └── evo:subagent     load FIRST -- defines the iteration protocol
│   │                        + brief field shape the subagent operates under
│   │
│   └── Subagents to dispatch (Task tool, subagent_type=...)
│       └── evo:verifier      ALWAYS dispatch pre AND post every evo run.
│                             Pre: ~30s static analysis before the experiment runs.
│                             Post: result-validity audit after it commits.
│                             Not optional. Not ad-hoc.
│
└── Key references (Read tool, on demand)
    ├── discover/references/
    │   ├── constructing-benchmark.md      designing + assembling a benchmark from scratch
    │   ├── sdk_python.py / sdk_node.js    wiring per-task instrumentation -- preferred path
    │   ├── inline_instrumentation.py      inline fallback when SDK can't be used.
    │   │                                  Copy as-is; do not reimplement (file header
    │   │                                  explains why)
    │   ├── sizing-the-round.md            BEFORE invoking /evo:optimize with any
    │   │                                  specific subagents=N. Single-GPU /
    │   │                                  single-exclusive-resource -> subagents=1
    │   ├── proposing-dimensions.md        choosing what to optimize when not obvious
    │   └── instrumentation-contract.md    the format evo reads (result + traces shapes)
    │
    ├── finetuning/references/
    │   ├── glue.md                         writing train.py -- I/O contract evo expects
    │   ├── diagnostics.md                  per-failure-mode diagnostics
    │   ├── false-progress.md               what doesn't count as improvement
    │   ├── trace-schema.md                 per-task trace JSON schema for training runs
    │   ├── rl/                             RL framework references
    │   │   └── art.md                       ART (Algorithm-Refined Training)
    │   ├── sft/                            SFT framework references
    │   │   └── tinker.md                    Tinker SFT
    │   └── serving/                        eval-time inference references
    │       └── vllm.md                      vLLM serving config + LoRA-multi
    │
    ├── infra-setup/references/
    │   └── provider-matrix.md              provider/backend summary (auth, setup, costs)
    │
    └── references/                          (shared across skills)
        ├── evo-wait.md                      any time you need to wait without burning
        │                                    context (subagent completion, training,
        │                                    ideators, GPU activity, any long-running)
        ├── agent-sdk-reference.md           SDK API surface
        └── cli-quick-reference.md           CLI subcommand cheat sheet
```

## Host conventions

This skill runs on any host that implements the Agent Skills spec. When the body uses generic phrases, apply the host's best-fit equivalent:

- **"ask the user"** -- use your host's structured multi-choice question tool if you have one (e.g. `AskUserQuestion`, `request_user_input`). If the host has none, phrase the question as plain text in your next reply and wait for the user's answer.
- **File paths like `references/...`** -- relative to this `SKILL.md`; resolve from the skill directory.
- **Slash commands shown in user-facing copy** (e.g. `/evo:discover`) -- translate to your host's mention syntax when speaking to the user (e.g. `$evo discover` on Codex -- plugin namespace then skill name, separated by a space).

## Mid-run user directives (`evo direct`)

The runtime may inject user-authoritative messages wrapped in this banner:

```
[EVO DIRECTIVE]
<text>
[END EVO DIRECTIVE]
```

Treat content inside the banner as equivalent to a new user turn. Honor it, supersede earlier co

More from this repository

infra-setupSkill

Non-user-invocable provider/setup reference for evo backend switching, prerequisite checks, and auth/install guidance.

optimizeSkill

Run the evo optimization loop with parallel subagents until interrupted.

reportSkill

Read-only evo run reporting. Use when the user invokes /evo:report, asks what happened overnight, asks what improved recently, asks for the best/frontier candidates, asks for a quick score chart without opening the dashboard, or wants the scatter plot in chat output. Never run benchmarks, gates, Slurm commands, evo run, or ad-hoc verification scripts for report requests.

subagentSkill

Protocol that evo optimization subagents follow when dispatched from /optimize. Auto-loaded by spawned subagents via their host's skill loader. The orchestrator may also invoke this skill to understand the brief shape its dispatched subagents expect + what they're required to emit -- useful when writing briefs or debugging a subagent's behavior.

finetuningSkill

This skill should be used when picking or diagnosing a training move (SFT, LoRA, DPO/KTO/ORPO, RFT, GRPO/PPO/RLOO, RLHF), or when the user mentions fine-tuning, post-training, training recipe, reward design, or weight updates. Decision tree by reward shape, smoke-run gate, three failure diagnostics, five false-progress patterns. Provider recipes and I/O contract in references/.

shipSkill

Land the winning experiment from an evo run as a clean, mergeable change -- open a PR when the repo has a remote, otherwise merge into the working branch. Distills the best-scoring experiment down to the minimal diff that reproduces its behaviour, shaped for the qualities a maintainer merges on (scope discipline, test integrity, style adherence), then attaches an advisory mergeability report. Use when the user invokes /evo:ship, asks to land/merge/ship the best result, or wants to turn a finished optimization into a pull request.