Skill851 estrellas del repoactualizado yesterday

coral-new-task

The coral-new-task skill provides a complete end-to-end guide for creating a new CORAL task, covering the three required components: task.yaml configuration, seed/ starter code directory, and grader/ packaged Python module with TaskGrader implementation. Use this when adding a new task to CORAL, porting an existing benchmark, or migrating legacy eval/grader.py examples to the packaged grader format, including common pitfalls like incorrect repo_path references, reversed score directions, and missing run() function signatures.

Ver fuente Repositorio: CORAL

Instalar en Claude Code

Copiar

git clone --depth 1 https://github.com/Human-Agent-Society/CORAL /tmp/coral-new-task && cp -r /tmp/coral-new-task/.claude/skills/coral-new-task ~/.claude/skills/coral-new-task

Después abre una sesión nueva de Claude Code; el skill carga automáticamente.

Definición

SKILL.md

# Creating a new CORAL task

A CORAL task is **three things** that must line up:

```
examples/<task>/
├── task.yaml      # config: name, description, grader entrypoint, agent count
├── seed/          # starter code agents see when they begin (the repo_path)
│   └── solution.py
└── grader/        # standalone Python package
    ├── pyproject.toml
    └── src/<task>_grader/
        ├── __init__.py
        └── grader.py     # class Grader(TaskGrader): ...
```

The packaged form is the only supported form. The package gives the grader its own venv and ships everything the eval needs — grader code, helper modules, and hidden data (see "Hidden data" below).

## Reference implementations

Look at these before writing anything new — copy the closest one and edit:

| Reference | When to copy it |
|---|---|
| [examples/erdos/](examples/erdos/) | Minimal packaged grader, single grader file, numpy-only deps |
| [examples/dna_design/](examples/dna_design/) | Packaged grader with bundled data files (`importlib.resources`) and `[ml]` optional-deps for heavy libs |
| [examples/swebench-verified/](examples/swebench-verified/) | Tiered eval (different instance counts per tier), private answer keys, harbor integration |
| [examples/circle_packing/](examples/circle_packing/) | Smallest packaged task end-to-end — single solution file, single grader file |
| [examples/mnist/](examples/mnist/) | Packaged grader with a hidden answer key (note: secret data belongs under `grader.private`, not a readable packaged `taskdata/`) |

## 1. The seed

Whatever lives in `seed/` is what the agent sees on first checkout — it's the working directory the grader will later score. The contract between `seed/` and the grader is the **program file**: a Python file with a function the grader imports and calls.

The convention across examples is:
- `solution.py` (or `initial_program.py`) defining a top-level `run()` function.
- The grader passes `program_file: "solution.py"` via `grader.args`.
- `run()`'s signature is whatever the grader expects — usually `() -> result` or `(input_path) -> result`.

Put a real, runnable baseline here. Agents should be able to `coral eval` immediately and get a non-zero score, so they have a starting point to improve. A no-op skeleton that crashes is not a good baseline.

If the task needs data files at runtime (training data, fixtures), put them under `seed/data/` and reference them by relative path from `solution.py`. The grader will see them at `<codebase_path>/data/...`.

## 2. The grader

### Packaged grader — the recommended path

```
grader/
├── pyproject.toml
└── src/<task>_grader/
    ├── __init__.py
    └── grader.py
```

`pyproject.toml` is a thin Hatchling package. Crib from [examples/erdos/grader/pyproject.toml](examples/erdos/grader/pyproject.toml):

```toml
[project]
name = "<task>-grader"
version = "0.1.0"
description = "CORAL grader for the <task> task."
requires-python = ">=3.11"
dependencies = ["coral", "numpy"]   # Whatever the grader actually imports.

[build-system]
requires = ["hatchling"]
build-backend = "hatchling.build"

[tool.hatch.build.targets.wheel]
packages = ["src/<task>_grader"]
```

Subclass `TaskGrader` and implement `evaluate()`:

```python
# grader/src/<task>_grader/grader.py
from coral.grader import TaskGrader
from coral.types import ScoreBundle


class Grader(TaskGrader):
    def evaluate(self) -> float | ScoreBundle:
        program_file = self.args.get("program_file", "solution.py")
        # self.codebase_path  — the agent's commit checked out detached
        # self.private_dir    — .coral/private/ (your hidden answer keys live here)
        # self.args           — dict from task.yaml grader.args
        # self.timeout        — grader.timeout in seconds (or None)
        # self.eval_logs_dir  — write subprocess logs / artifacts the agent should see post-grade

        try:
            result = run_program_and_score(...)
        except TimeoutError:
            return self.fail(f"Evaluation timed out after {self.timeout}s")
        except Exception as e:
            return self.fail(f"Evaluation failed: {e}")

        return self.score(result, explanation=f"score={result:.4f}")
```

What you have available on `self`:

| Attribute / method | Use it for |
|---|---|
| `self.codebase_path` | Path to the commit being graded (detached worktree). Read-only — anything written here is discarded after the eval. |
| `self.private_dir` | `.coral/private/`. Your answer keys, hidden test data, anything from `grader.private` lives here. |
| `self.args` | `dict` from `task.yaml::grader.args`. Use `self.args.get("program_file", "solution.py")` etc. |
| `self.timeout` | Eval timeout in seconds (or `None` if `grader.timeout: 0`). |
| `self.eval_logs_dir` | Per-attempt directory for logs/artifacts that should outlive the grader. Symlinked into each agent worktree as `<shared_dir>/eval_logs/<hash>/`. |
| `self.score(value, explanation=...)` | Build a single-task `ScoreBundle` from a numeric score. |
| `self.fail(reason)` | Return a fail `ScoreBundle` with `reason` as feedback. |
| `self.get_python_command()` | List for the `python` binary inside the codebase's env (uses `uv run` if a `pyproject.toml` is present). Always use this instead of `sys.executable` so task-specific deps are visible. |
| `self.run_program(filename, *args)` | Convenience: runs `<codebase_path>/<filename>` as a subprocess via `get_python_command()`. |

### Bundling data files with the grader

If the grader needs reference files (model weights, ground-truth answers, scoring fixtures), ship them inside the package and load via `importlib.resources`:

```python
import importlib.resources
scorer_dir = str(importlib.resources.files("<task>_grader.scorers"))
```

[examples/dna_design/grader/src/dna_design_grader/grader.py](examples/dna_design/grader/src/dna_design_grader/grader.py) is the canonical pattern — note the `scorers/` subpackage. Add the directory to `[tool.hatch.build.targets.wheel]` if it has non-Py

Del mismo repositorio

coral-debugSkill

Verify and debug changes to CORAL itself — smallest reproduce loop per area (grader / daemon / CLI / hooks / manager / workspace / hub / template / config / web), where to look when something breaks (hung graders, agent restart loops, stalled agents, missing heartbeat actions, corrupted shared state, broken worktree symlinks, grader import errors, wrong-task resume), how to inspect a live or finished run under `.coral/public/`, and the canonical lint/test commands. Use when editing code under `coral/` or chasing a CORAL bug, NOT when adding a new task or extending the framework.

coral-extendSkill

Add a new component to the CORAL framework itself — a new agent runtime under `coral/agent/builtin/` (claude_code/codex/cursor_agent style), a new CLI command in `coral/cli/`, a new bundled skill or subagent template under `coral/template/skills/` or `coral/template/agents/`, a new hook in `coral/hooks/`, a new field in `coral/config.py`, or a framework-level extension to the grader stack under `coral/grader/`. NOT for writing a per-task grader or adding an example task — use `coral-new-task` for that. NOT for debugging existing code — use `coral-debug`.

deep-researchSkill

Research the problem domain before coding. Web search for techniques, save raw sources, write structured findings, update the index.

organize-filesSkill

Organize the shared notes directory when it becomes hard to navigate. Restructure within research/ and experiments/, deduplicate, update index.md.

skill-creatorSkill

Autonomously create, test, and optimize skills by detecting reusable patterns in your own work. Use when you notice repeated tool sequences, recurring code patterns across attempts, or insights that should be captured as a packaged skill. Also use to benchmark and iterate on existing skills.

coral-quickstartSkill

The fast path from zero to a running CORAL experiment — what CORAL is and when to reach for it, installing the `coral` CLI, registering a runtime with `coral setup`, and the `.coral_workspace/` convention for pointing CORAL at code you already have and want optimized. Use this whenever the user asks "what is coral", "should I use coral for this", wants to install or get coral set up, hits a "command not found" for coral or doesn't have it installed yet, or says "use coral to optimize / speed up / improve this code" and you need the end-to-end onboarding from install to a launched run. Hands off to `setting-up-coral` (runtime bindings), `creating-a-coral-task` (grader authoring), and `running-coral-experiments` (operating a run) for depth.

creating-a-coral-taskSkill

Author a new CORAL task — the three pieces that must line up (`task.yaml`, `seed/`, a packaged `grader/`), the `coral init` → `coral validate` → smoke-test loop, and how to pick a grader pattern (stdout float, test pass-rate, ratio-vs-baseline, multi-metric, or an LLM rubric judge). Use whenever the user wants to create a CORAL task, write or wire a grader, port a benchmark into CORAL, score open-ended outputs (reports/memos) with a judge, or debug a grader that crashes on the seed / ranks the leaderboard backwards / leaks the answer key. Deep references for the TaskGrader API, grader patterns, rubric judges, and the full task.yaml schema live alongside this skill.

running-coral-experimentsSkill

Run and manage CORAL experiments from the operator side — launch agents with `coral start` (dotlist overrides, model/count, tmux vs local), monitor with `coral status` / `coral log` / `coral show` / the web dashboard, and drive the loop with `coral resume` (inject instructions, fork from an attempt), `coral heartbeat` (tune reflection cadence), and `coral stop`. Use whenever the user wants to start a CORAL run, check on agents, read scores/leaderboard, steer or resume a run, diagnose agents that keep restarting or fail every eval, scale to more agents or islands, or stop a run. Deep references for steering/heartbeat tuning and scaling/troubleshooting live alongside this skill.