Skip to main content
ClaudeWave
Skill722 estrellas del repoactualizado today

coral-extend

coral-extend is a Claude Code skill for adding new components to the CORAL framework itself, such as agent runtimes under `coral/agent/builtin/`, CLI commands in `coral/cli/`, bundled skill templates, hooks, configuration fields, or grader infrastructure extensions. Use this skill when modifying the CORAL package architecture rather than debugging existing code or creating new example tasks.

Instalar en Claude Code
Copiar
git clone --depth 1 https://github.com/Human-Agent-Society/CORAL /tmp/coral-extend && cp -r /tmp/coral-extend/.claude/skills/coral-extend ~/.claude/skills/coral-extend
Después abre una sesión nueva de Claude Code; el skill carga automáticamente.

SKILL.md

# Extending the CORAL framework

For day-to-day debug / reproduce loops see the sibling `coral-debug` skill. For creating a new `examples/<task>/` (seed + task.yaml + grader package) see `coral-new-task`. This skill covers *adding new components to the CORAL package itself*.

## Extending the grader infrastructure

If you're writing a grader for a specific task, use `coral-new-task`. This section is only for changes to the grader **framework** under `coral/grader/`:

- New helpers on `TaskGrader` (`coral/grader/task_grader.py`) — make sure they're useful to multiple existing example graders before adding.
- New `GraderInterface` implementations (`coral/grader/protocol.py` / `base.py`) — the bar is high; the existing protocol covers everything we currently need.
- Daemon-side changes (`coral/grader/daemon.py`) — concurrency, queue caps, worktree isolation, retry policy. Cover with `tests/test_grader_daemon.py`.
- Built-in graders under `coral/grader/builtin/` — `function_grader.py` is the only one today; not wired through `task.yaml`. New built-ins should justify why a `TaskGrader` subclass per task isn't enough.

## A new agent runtime

Adding a new runtime (e.g. another coding-agent CLI) means three small files plus a registry entry.

1. Create `coral/agent/builtin/<name>.py` and subclass `AgentRuntime` (`coral/agent/runtime.py`). Existing runtimes are the canonical reference — `claude_code.py` is the most complete; `codex.py` and `cursor_agent.py` are smaller and easier to mimic.
2. Register the runtime in `coral/agent/registry.py`:
   ```python
   _RUNTIMES["my_runtime"] = MyRuntime
   _ALIASES["mine"] = "my_runtime"
   _DEFAULT_MODELS["my_runtime"] = "default-model-id"
   ```
3. Decide the runtime's native shared-state directory name (`.claude` for Claude Code, `.codex` for Codex, etc.). The worktree symlink uses this; pass it through `shared_dir` so `generate_coral_md(...)` renders the right paths.
4. If the runtime needs special config plumbing (e.g. `cursor_agent.json`, `opencode.json`, gateway port), follow the `opencode` pattern: emit a per-agent config file inside the worktree at startup.
5. Add a smoke test in `tests/test_<runtime>.py` modeled on `tests/test_cursor_agent.py`.

Reference recent additions: PR #79 (cursor_agent), commit `f6f266e` (codex web_search config fix).

## A new CLI command

CLI is an old-school argparse single-file dispatcher.

1. Add a parser block in `coral/cli/__init__.py::main()`. Match the existing style — `_HelpOnErrorParser`, an epilog with `Examples:`, `_CommandHelpFormatter`. Add the new command name to `_VISIBLE_COMMANDS` so "did you mean?" suggestions work.
2. Implement `cmd_<name>(args: argparse.Namespace) -> None` in the most-fitting module under `coral/cli/`:
   - `start.py` — agent lifecycle (start/resume/stop/status)
   - `query.py` — read-only inspection (log/show/notes/skills/runs)
   - `eval.py` — agent-side commands that mutate the worktree (eval/wait/diff/revert/checkout)
   - `heartbeat.py` — heartbeat configuration
   - `ui.py` — dashboard
   - `author.py` — `init` / `validate`
   Create a new module if none of those fit; keep imports lazy so `coral --help` stays fast.
3. Wire the function into the `commands = {...}` dict at the bottom of `main()`.
4. If your command operates on a specific run, accept `--task` / `--run` via `_add_run_args(parser)` and resolve with `coral.cli._helpers.find_coral_dir`.
5. Add an example to `CLAUDE.md`'s Commands section.

## A new bundled skill or subagent template

These ship inside the package and are seeded into every run's `.coral/public/skills/` (or `agents/`) by `coral/workspace/project.py`.

- **Skill** — create `coral/template/skills/<name>/SKILL.md` with frontmatter `name` and `description`. Include `scripts/` and `references/` subdirs as needed; existing examples are `deep-research`, `organize-files`, `skill-creator`.
- **Subagent** — create `coral/template/agents/<name>.md` (single markdown file). Existing examples are `deep-researcher` and `librarian`.
- Add a test in `tests/test_template.py` if the rendering pulls in new template variables.

The seed copy is one-shot per run (`if not dst.exists()`), so iterating on template content during development means deleting `<run_dir>/.coral/public/skills/<name>/` and re-running `coral start`, or just editing the destination directly for that run.

## A new hook

Right now there's only `coral/hooks/post_commit.py`. If you add another hook:
- Define a clear single entrypoint function (model on `submit_eval`).
- Make it pure-function over `coral_dir` + agent_id where possible.
- Atomic writes to `.coral/public/` only; never write to a worktree from a hook.
- Add coverage to `tests/test_hooks.py`.

## Configuration changes

`coral/config.py` is dataclass-based and merged via OmegaConf. When adding a new field:

1. Add it to the right dataclass (`AgentConfig`, `GraderConfig`, ...) with a sensible default.
2. If it deserves runtime validation, add it to the `__post_init__` of that dataclass.
3. Cover the new field in `tests/test_config.py`.
4. Update `examples/<task>/task.yaml` only if the field is task-author facing — internal knobs should stay defaulted.
5. Mention it in `CLAUDE.md` if it changes user-visible behavior; otherwise leave the docs alone (CLAUDE.md describes invariants, not every flag).

## Don't forget

- **Lint + test before pushing**: `uv run ruff check . && uv run ruff format . && uv run pytest tests/ -v`.
- **Backward compatibility for run dirs.** People resume old runs. Anything that reads from `.coral/public/` must tolerate missing files (return defaults), not crash.
- **No agent-side `git`.** All commits go through `coral eval` → `submit_eval`. Don't add helpers that shell out to git from agent context.
coral-debugSkill

Verify and debug changes to CORAL itself — smallest reproduce loop per area (grader / daemon / CLI / hooks / manager / workspace / hub / template / config / web), where to look when something breaks (hung graders, agent restart loops, stalled agents, missing heartbeat actions, corrupted shared state, broken worktree symlinks, grader import errors, wrong-task resume), how to inspect a live or finished run under `.coral/public/`, and the canonical lint/test commands. Use when editing code under `coral/` or chasing a CORAL bug, NOT when adding a new task or extending the framework.

coral-new-taskSkill

End-to-end recipe for adding a new task under `examples/` — the three pieces that have to line up (`task.yaml`, `seed/`, and `grader/`), what to put in each, the `TaskGrader` API surface, the `coral validate` → smoke-test loop, and the common mistakes (repo_path pointing at the wrong dir, score direction backwards, hidden answer keys leaking into seed/, grader writing to codebase_path which the daemon force-removes, private-vs-public confusion, missing `run()` signature). Use whenever the user wants to add a new CORAL task or port an existing benchmark into CORAL.

deep-researchSkill

Research the problem domain before coding. Web search for techniques, save raw sources, write structured findings, update the index.

organize-filesSkill

Organize the shared notes directory when it becomes hard to navigate. Restructure within research/ and experiments/, deduplicate, update index.md.

skill-creatorSkill

Autonomously create, test, and optimize skills by detecting reusable patterns in your own work. Use when you notice repeated tool sequences, recurring code patterns across attempts, or insights that should be captured as a packaged skill. Also use to benchmark and iterate on existing skills.