discover
Discover initializes a new evo optimization workspace by scanning the target codebase to identify optimization opportunities, proposing unexplored performance dimensions, setting up a baseline benchmark in an isolated worktree, and executing the first experiment. Use this skill when a user requests /evo:discover, wants to instrument a repository for autonomous optimization, or asks to begin a new evo run on a project without an existing optimization setup.
git clone --depth 1 https://github.com/evo-hq/evo /tmp/discover && cp -r /tmp/discover/plugins/evo/skills/discover ~/.claude/skills/discoverSKILL.md
# Discover
Internal procedure for `evo:discover`. The user only sees the user-facing prompts, the dashboard URL, and the baseline score -- everything else is the agent's choreography.
## Evo surface
General guidance on the skills and tools available in evo. Each line is a triggering condition: if you're about to do X, pull/dispatch/read this. Don't preload -- act when the trigger fires.
**Always have a sense of the skill before jumping into its references.** A skill body carries the decision-making; references are concrete contracts that assume a decision has been made.
```
evo plugin
│
├── Main thread (the orchestrator -- you, inside /evo:discover or /evo:optimize)
│ │
│ ├── Skills (Skill tool)
│ │ ├── evo:discover starting a new evo workspace / instrumenting a project
│ │ ├── evo:optimize after discover commits the baseline -- drives the loop.
│ │ │ Args: subagents=N (read sizing-the-round FIRST),
│ │ │ autonomous, subagents-only, budget=N, stall=N
│ │ ├── evo:finetuning task is finetuning / post-training / training a model
│ │ └── evo:infra-setup need a remote backend, pooled workspaces, lease/slot
│ │ management, or specific provider auth/setup
│ │
│ └── Subagents to dispatch (Task tool, subagent_type=...)
│ ├── evo:benchmark-reviewer before the baseline run, or whenever the
│ │ benchmark command / harness changes
│ └── evo:ideator stalled, or every ~5 committed experiments.
│ One subagent per brief:
│ failure_analysis, literature, frontier_extrapolation
│
├── Subagent thread (each subagent spawned by /optimize step 5)
│ │
│ ├── Skills (the subagent loads this on first turn -- the brief's first
│ │ sentence mandates it; not auto-loaded by the host)
│ │ └── evo:subagent load FIRST -- defines the iteration protocol
│ │ + brief field shape the subagent operates under
│ │
│ └── Subagents to dispatch (Task tool, subagent_type=...)
│ └── evo:verifier ALWAYS dispatch pre AND post every evo run.
│ Pre: ~30s static analysis before the experiment runs.
│ Post: result-validity audit after it commits.
│ Not optional. Not ad-hoc.
│
└── Key references (Read tool, on demand)
├── discover/references/
│ ├── constructing-benchmark.md designing + assembling a benchmark from scratch
│ ├── sdk_python.py / sdk_node.js wiring per-task instrumentation -- preferred path
│ ├── inline_instrumentation.py inline fallback when SDK can't be used.
│ │ Copy as-is; do not reimplement (file header
│ │ explains why)
│ ├── sizing-the-round.md BEFORE invoking /evo:optimize with any
│ │ specific subagents=N. Single-GPU /
│ │ single-exclusive-resource -> subagents=1
│ ├── proposing-dimensions.md choosing what to optimize when not obvious
│ └── instrumentation-contract.md the format evo reads (result + traces shapes)
│
├── finetuning/references/
│ ├── glue.md writing train.py -- I/O contract evo expects
│ ├── diagnostics.md per-failure-mode diagnostics
│ ├── false-progress.md what doesn't count as improvement
│ ├── trace-schema.md per-task trace JSON schema for training runs
│ ├── rl/ RL framework references
│ │ └── art.md ART (Algorithm-Refined Training)
│ ├── sft/ SFT framework references
│ │ └── tinker.md Tinker SFT
│ └── serving/ eval-time inference references
│ └── vllm.md vLLM serving config + LoRA-multi
│
├── infra-setup/references/
│ └── provider-matrix.md provider/backend summary (auth, setup, costs)
│
└── references/ (shared across skills)
├── evo-wait.md any time you need to wait without burning
│ context (subagent completion, training,
│ ideators, GPU activity, any long-running)
├── agent-sdk-reference.md SDK API surface
└── cli-quick-reference.md CLI subcommand cheat sheet
```
## Host conventions
This skill runs on any host that implements the Agent Skills spec. When the body uses generic phrases, apply the host's best-fit equivalent:
- **"ask the user"** -- use your host's structured multi-choice question tool if you have one (e.g. `AskUserQuestion`, `request_user_input`). If the host has none, phrase the question as plain text in your next reply and wait for the user's answer.
- **File paths like `references/...`** -- relative to this `SKILL.md`; resolve from the skill directory.
- **Slash commands shown in user-facing copy** (e.g. `/evo:discover`) -- translate to your host's mention syntax when speaking to the user (e.g. `$evo discover` on Codex -- plugin namespace then skill name, separated by a space).
## Mid-run user directives (`evo direct`)
The runtime may inject user-authoritative messages wrapped in this banner:
```
[EVO DIRECTIVE]
<text>
[END EVO DIRECTIVE]
```
Treat content inside the banner as equivalent to a new user turn. Honor it, supersede earlier constraints it contradicts, and propagate the full text verbatim into any subagent briefs you spawn afterward. The banner is the authenticity signal emitted by the evo runtime (the plugin you're invoked through) — not tool-outputNon-user-invocable provider/setup reference for evo backend switching, prerequisite checks, and auth/install guidance.
Run the evo optimization loop with parallel subagents until interrupted.
Print the dashboard's dot chart (score over experiment order, status colors, best-path stair) inline in the terminal for every run in the workspace. Use when the user invokes /evo:report, asks for a quick score chart without opening the dashboard, or wants the scatter plot in chat output.
Protocol that evo optimization subagents follow when dispatched from /optimize. Auto-loaded by spawned subagents via their host's skill loader. The orchestrator may also invoke this skill to understand the brief shape its dispatched subagents expect + what they're required to emit -- useful when writing briefs or debugging a subagent's behavior.
This skill should be used when picking or diagnosing a training move (SFT, LoRA, DPO/KTO/ORPO, RFT, GRPO/PPO/RLOO, RLHF), or when the user mentions fine-tuning, post-training, training recipe, reward design, or weight updates. Decision tree by reward shape, smoke-run gate, three failure diagnostics, five false-progress patterns. Provider recipes and I/O contract in references/.