coral-debug
The coral-debug skill provides reproduction loops and diagnostic guidance for debugging the CORAL package itself across its major components (grader, daemon, CLI, hooks, manager, workspace, hub, template, config, web). It maps code areas to their fastest test commands, lists common failure modes and where to inspect them in `.coral/public/`, and includes canonical lint and end-to-end smoke test procedures. Use this when modifying CORAL's internal codebase or investigating framework bugs, not when authoring tasks or extending with new capabilities.
git clone --depth 1 https://github.com/Human-Agent-Society/CORAL /tmp/coral-debug && cp -r /tmp/coral-debug/.claude/skills/coral-debug ~/.claude/skills/coral-debugSKILL.md
# CORAL debug & change-verification workflows
This skill is for people (and Claude Code) hacking on the CORAL package itself, not for users authoring a task. For authoring guides see the siblings:
- `coral-new-task` — creating a new `examples/<task>/` (seed + task.yaml + grader)
- `coral-extend` — extending CORAL itself (new runtime, new CLI command, new bundled skill, ...)
## Reproduce loops
Pick the smallest one that exercises your change.
| Editing... | Fastest reproduce |
|---|---|
| `coral/grader/{task_grader,loader,subprocess_grader}.py` or a builtin grader | `uv run coral validate examples/circle_packing` (or any example with a packaged grader) |
| `coral/grader/daemon.py` (parallel grading, queueing, worktree isolation) | `uv run pytest tests/test_grader_daemon.py -v` |
| `coral/grader/{base,protocol}.py` | `uv run pytest tests/test_grader.py tests/test_subprocess_grader.py -v` |
| `coral/cli/*.py` (argparse, dispatch, output formatting) | Run the command directly: `uv run coral log --help`, `uv run coral status`, etc. against a `latest` run under `examples/<task>/results/` |
| `coral/hooks/post_commit.py` (`submit_eval`) | `uv run pytest tests/test_hooks.py -v`, then end-to-end via `coral start -c task.yaml agents.count=1` and watch `.coral/public/attempts/` |
| `coral/agent/manager.py`, `state.py`, `exit_classifier.py` | `uv run pytest tests/test_manager_reliability.py tests/test_manager_seen_attempts.py -v` |
| `coral/agent/builtin/*.py` (a runtime) | `uv run pytest tests/test_<runtime>.py` if it exists, then `coral start -c task.yaml agents.runtime=<name> agents.count=1` |
| `coral/workspace/{project,worktree,repo,grader_env}.py` | `uv run pytest tests/test_workspace.py tests/test_grader_env.py -v` |
| `coral/hub/{attempts,notes,skills,checkpoint,heartbeat}.py` | `uv run pytest tests/test_hub.py tests/test_checkpoint.py tests/test_heartbeat.py -v` |
| `coral/template/coral_md.py` or templates | `uv run pytest tests/test_template.py -v` |
| `coral/config.py` | `uv run pytest tests/test_config.py -v` |
| `coral/web/*` (dashboard) | `uv run coral ui` against any existing run; reload the browser after edits |
End-to-end smoke (slow but exercises the whole pipeline):
```bash
uv run coral start -c examples/circle_packing/task.yaml agents.count=1 run.session=local
# Wait for one eval to land in .coral/public/attempts/, then:
uv run coral stop
```
## Where to look when X breaks
| Symptom | First place to look |
|---|---|
| Grader hangs / pending attempts pile up | `.coral/public/grader_daemon.pid` (is daemon alive?), `.coral/public/grader_daemon_heartbeat` (mtime), `.coral/public/eval_logs/<hash>/` for grader stdout/stderr |
| `coral eval` errors out | The pending JSON itself: `.coral/public/attempts/<hash>.json` — `status` and `feedback` fields. Source: `coral/hooks/post_commit.py::submit_eval` |
| Agent restart loop | `coral status` shows pause state. Source: `coral/agent/manager.py` (`restart_burst_threshold`, `restart_burst_window`). Per-agent runtime stdout/stderr is captured under the worktree. |
| Stalled agent | `agents.timeout` watchdog in `coral/agent/manager.py`. Grader-queue exemption (`grader_pending_max_age`) skips the watchdog while a recent submission is still pending. |
| Heartbeat actions not firing | `.coral/public/heartbeat/<agent_id>.json` (per-agent action list), `.coral/public/eval_count` (global counter). Logic: `coral/agent/heartbeat.py::HeartbeatRunner.check`. |
| Shared state corrupted / race | `.coral/public/.git/` is a real git repo — `git -C .coral/public log` shows checkpoint history; `coral notes --history` is the friendly view. |
| Worktree symlinks missing | `coral/workspace/worktree.py` — symlinks `.coral/public/` into each agent worktree under the runtime-specific name (`.claude` / `.codex` / `.opencode`). |
| Grader can't import its package | Check `.coral/private/grader_venv/` — `setup_grader_env` ran `uv venv` + `grader.setup` once at run start. Re-running `coral start` on the same `run_dir` does NOT re-bootstrap; delete the venv to force it. |
| Resume picks up wrong task | `coral resume` reads `.coral/config.yaml` and `.coral/config_dir`. The `latest` symlink at `results/<task-slug>/latest` decides which run is resumed. |
## Inspect a live or finished run
```bash
RUN=$(readlink -f results/<task-slug>/latest)
ls "$RUN/.coral/public/" # attempts/, notes/, skills/, agents/, logs/, eval_logs/, eval_count
cat "$RUN/.coral/public/attempts/<hash>.json" | jq .
ls "$RUN/agents/" # one worktree per agent
cat "$RUN/.coral/config.yaml" # exact config used (post-merge)
```
The web dashboard (`coral ui`) reads the same files; if you're debugging dashboard rendering, `coral/web/api.py` is the route handler and `coral/web/static/` is the built React bundle.
## Test + lint
```bash
uv sync --extra dev # one-time, gets pytest + ruff
uv run pytest tests/ -v # full suite (~seconds, no docker)
uv run pytest tests/test_<thing>.py -v -k <pattern>
uv run ruff check . # lint
uv run ruff format . # autoformat
```
There is no separate type-check step in CI yet; tests cover the contract.
## Quick task scaffold for ad-hoc testing
When you need a throwaway task to exercise a code change:
```bash
uv run coral init /tmp/coral-scratch
# edit /tmp/coral-scratch/grader/src/coral_scratch_grader/grader.py to return a simple float
uv run coral validate /tmp/coral-scratch
uv run coral start -c /tmp/coral-scratch/task.yaml agents.count=1 run.session=local
```
`coral validate` runs the grader against `seed/` in a tempdir without spawning agents — the fastest way to confirm a grader change compiles and produces a `Score`.
## Conventions
- **No `git` from agents.** All commits go through `coral eval`. If you're adding a feature that needs to commit, route it through `coral/hooks/post_commit.py` or `coral/hub/checkpoint.py`.
- **Atomic writes** for any JSON in `.coral/puAdd a new component to the CORAL framework itself — a new agent runtime under `coral/agent/builtin/` (claude_code/codex/cursor_agent style), a new CLI command in `coral/cli/`, a new bundled skill or subagent template under `coral/template/skills/` or `coral/template/agents/`, a new hook in `coral/hooks/`, a new field in `coral/config.py`, or a framework-level extension to the grader stack under `coral/grader/`. NOT for writing a per-task grader or adding an example task — use `coral-new-task` for that. NOT for debugging existing code — use `coral-debug`.
End-to-end recipe for adding a new task under `examples/` — the three pieces that have to line up (`task.yaml`, `seed/`, and `grader/`), what to put in each, the `TaskGrader` API surface, the `coral validate` → smoke-test loop, and the common mistakes (repo_path pointing at the wrong dir, score direction backwards, hidden answer keys leaking into seed/, grader writing to codebase_path which the daemon force-removes, private-vs-public confusion, missing `run()` signature). Use whenever the user wants to add a new CORAL task or port an existing benchmark into CORAL.
Research the problem domain before coding. Web search for techniques, save raw sources, write structured findings, update the index.
Organize the shared notes directory when it becomes hard to navigate. Restructure within research/ and experiments/, deduplicate, update index.md.
Autonomously create, test, and optimize skills by detecting reusable patterns in your own work. Use when you notice repeated tool sequences, recurring code patterns across attempts, or insights that should be captured as a packaged skill. Also use to benchmark and iterate on existing skills.