Skip to main content
ClaudeWave
Skill321 estrellas del repoactualizado 12d ago

hydra

Hydra is a lead-driven multi-agent orchestration toolkit that manages workflow dispatch, retries, and result collection while the Lead agent makes strategic decisions based on typed semantic outcomes from workers. Use Hydra when coordinating parallel sub-agent tasks, inspecting workflow execution states through telemetry polling, or intervening in stalled runs with targeted feedback rather than full restarts.

Instalar en Claude Code
Copiar
git clone --depth 1 https://github.com/blueberrycongee/termcanvas /tmp/hydra && cp -r /tmp/hydra/skills/skills/hydra ~/.claude/skills/hydra
Después abre una sesión nueva de Claude Code; el skill carga automáticamente.

SKILL.md

# Hydra Orchestration Toolkit

Hydra is a Lead-driven orchestration toolkit. You (the Lead agent) make strategic
decisions; Hydra handles operational management (dispatch, retry, health checks,
result collection).

Sub-agents output semantic intent (`done`/`needs_rework`/`replan`), not routing
information. Hydra manages the lifecycle; you decide what happens next.

## Why this design (vs. other coding-agent products)

- **SWF decider pattern, specialized for LLM deciders.** Hydra is the AWS SWF / Cadence / Temporal decider pattern. `hydra watch` is `PollForDecisionTask`; the Lead is the decider; `lead_terminal_id` enforces single-decider semantics.
- **Parallel-first, not bolted on.** `dispatch` + `depends_on` + worktree + `merge` are first-class. Other products (Factory.ai's Droid, Amp, Claude Code subagents) treat parallelism as open research; Hydra makes it the default.
- **Typed result contract.** Workers publish a schema-validated `result.json` (`outcome: completed | stuck | error`, optional `stuck_reason: needs_clarification | needs_credentials | needs_context | blocked_technical`). Other products return free-text final messages and require downstream parsing.
- **Lead intervention points.** `hydra reset --feedback` lets the Lead actually intervene at decision points instead of being block-and-join. A stale or wrong run is one `reset` away.

## Lead operational rules

Core rules:
- **Root cause first.** Fix the real implementation problem before changing tests, fixtures, or mocks.
- **Do not hack tests** to force a green result. If a test is wrong, fix it honestly.
- **No silent fallbacks** or swallowed errors. Surface failure with `outcome=stuck` or `outcome=error`.

Agent launch rule:
- When dispatching Claude/Codex through TermCanvas, start a fresh agent terminal with `termcanvas terminal create --prompt "..."`.
- Do not use `termcanvas terminal input` for task dispatch — it is not a supported automation path.

Telemetry polling:
- Treat `hydra watch` as the main polling loop. Do not infer progress from terminal prose.
- Before deciding wait / retry / takeover, query:
  - `termcanvas telemetry get --workflow <workflowId> --repo .`
  - `termcanvas telemetry get --terminal <terminalId>`
  - `termcanvas telemetry events --terminal <terminalId> --limit 20`
- Watch the derived telemetry states: `awaiting_contract` means the worker has not yet published `result.json`; `stall_candidate` means the worker may be hung. Trust `derived_status` and `task_status` over terminal prose.

## Core workflow

The Lead is the decider — Lead reads the codebase, makes the strategic
calls, and dispatches workers for the steps that genuinely need a fresh
agent process. There is no "researcher" role: the Lead does the research
itself before deciding what to build.

```
hydra init --intent "Add OAuth login" --repo .
# → { workflow_id, worktree_path }

# (Lead reads the code, reviews related modules, decides the plan.)

hydra dispatch --workflow W --node dev --role dev \
  --intent "Implement OAuth middleware and its tests following the design in CLAUDE.md" --repo .
# → { node_id, assignment_id, status: "dispatched" }

hydra watch --workflow W --repo .
# → DecisionPoint: dev completed

hydra dispatch --workflow W --node review --role reviewer \
  --intent "Independent review of the OAuth change" \
  --depends-on dev --repo .

hydra watch --workflow W --repo .
# → DecisionPoint: review completed

hydra complete --workflow W --repo .
```

## Parallel dev

When the Lead identifies independent work streams, dispatch multiple
dev workers with isolated worktrees:

```
hydra dispatch --workflow W --node dev-frontend --role dev \
  --intent "Frontend OAuth components and their tests" \
  --worktree .worktrees/frontend --repo .

hydra dispatch --workflow W --node dev-backend --role dev \
  --intent "Backend OAuth middleware and its tests" \
  --worktree .worktrees/backend --repo .

hydra watch --workflow W --repo .
# → DecisionPoint: both completed

hydra merge --workflow W --nodes dev-frontend,dev-backend --repo .
```

## Handling agent results

When `watchUntilDecision` returns a `node_completed` DecisionPoint:

1. Check `outcome`:
   - **`completed`** — agent finished. Read `report_file` to decide next step.
   - **`stuck`** — agent can't proceed. Read `report_file` for what's needed.
   - **`error`** — Hydra already retried; if still failing, it reports to you.

2. Read the `report.md` referenced by `report_file` to decide:
   - Dispatch next node → `hydra dispatch ...`
   - Reset for rework → `hydra reset --workflow W --node dev --feedback "..." --repo .`
   - Re-dispatch after reset → `hydra redispatch --workflow W --node dev --repo .`
   - Complete workflow → `hydra complete --workflow W --repo .`

Hydra promotes blocked nodes to eligible automatically, but **you decide
when to dispatch**. Check `newly_eligible` in the DecisionPoint to see
what's ready.

## Agent role guidance

The role file (in `.hydra/roles/<name>.md` or shipped builtin) declares
an ordered `terminals[]` list of CLI / model / reasoning_effort triples.
Hydra always picks `terminals[0]` for now; future fallback logic walks
the list. The Lead chooses **which role** to dispatch — the role file
chooses **which CLI** to invoke and at what reasoning level.

**lead** — The Hydra decider itself. Not dispatched by `hydra dispatch`;
this role file codifies what the Lead terminal (the one holding the
`lead_terminal_id` lock and talking to the human) is supposed to be.
Default: Claude Opus at max reasoning.

**dev** — Writes code AND the tests that cover it. A dev-produced change
is not complete until its test surface covers the new behavior. Default:
Claude Opus at max reasoning, codex fallback.

**reviewer** — Independent cross-model second opinion at the highest
available reasoning level. Default: Codex at xhigh, Claude Opus max
fallback. Runs on a **different model family** from dev so blind spots
don't overlap. The last line of defense before Lead