Skip to main content
ClaudeWave
Skill8 repo starsupdated today

dos-supervise-loop

The dos-supervise-loop skill maintains a target population of worker dispatch-loops across a DOS workspace by periodically reading the lane roster, consulting the kernel for a spawn/reap/flag plan via `dos loop --json`, and executing that plan: launching new workers on free admissible lanes, scavenging only stalled leases, and surfacing spinning workers to operators rather than killing them. Use it to keep a fleet of workers alive at a specified target count without manual intervention.

Install in Claude Code
Copy
git clone --depth 1 https://github.com/anthony-chaudhary/dos-kernel /tmp/dos-supervise-loop && cp -r /tmp/dos-supervise-loop/src/dos/skills/dos-supervise-loop ~/.claude/skills/dos-supervise-loop
Then start a new Claude Code session; the skill loads automatically.

SKILL.md

# dos-supervise-loop — the generic worker-population supervisor

> **The init/PID-1 of a dispatch fleet.** It keeps `--target` worker
> dispatch-loops alive across the lane roster: each tick it counts live lane
> leases, classifies each worker's liveness, and fills the roster up to target by
> launching one `/dos-dispatch-loop` per free admissible lane. The *what to do*
> is a typed kernel verdict — `dos loop` emits a spawn/reap/flag plan; this skill
> only carries it out. It reaps a worker ONLY when the kernel says STALLED, never
> a healthy one, and it FLAGs a SPINNING worker to the operator rather than
> killing it (acting on a spin is not the supervisor's job).

The supervisor's whole contract is one rule the kernel owns: **the population is
filled from the plan, not from a guess.** A tick is `gather evidence → ask
`dos loop` → carry out spawn/reap/flag`. The four dispositions are the kernel's:

1. **SPAWN** — a free, admissible lane below target: launch one worker on it.
2. **REAP** — a STALLED lease: scavenge it so the lane is free to refill.
3. **HOLD** — an ADVANCING (or alive-counted SPINNING) worker: leave it alone.
4. **FLAG** — a SPINNING worker (advisory) or an excess over target: surface it,
   do not kill it.

## Inputs

- `--target <N>` (default 1) — the desired live-worker population. The kernel
  caps the achievable population at the *admissible* count (how many roster lanes
  can simultaneously hold a worker given their disjointness); a `--target` above
  that yields a TARGET_UNREACHABLE verdict naming the fix.
- `--max-concurrency <N>` (optional, docs/283) — the **derived-claim concurrency
  budget**. On a DYNAMIC-CLAIM workspace (`concurrent = []`, where a lane is a
  HANDLE whose disjointness is enforced per-pick at acquire time, not by a fixed
  tree), the static admissible count is 1, so a `--target` above 1 is structurally
  unreachable. Declaring this budget lets the supervisor keep up to N workers alive
  on a fungible auto-pick handle WITHOUT pre-enumerating N disjoint trees — the
  arbiter still narrows each worker's per-pick claim at its Step 0. Declare ONE
  number, not N trees. Set it standing in `dos.toml [supervise] max_concurrency`,
  or pass `--max-concurrency` for a one-off run. Off by default (admissible stays
  the static disjoint-lane count — byte-for-byte today's).
- `--interval <seconds>` (long default) — the wakeup cadence between ticks. A
  supervisor wakes rarely; it is a watchdog, not a busy-loop.
- `--max-ticks <N>` (optional) — stop after N ticks. Omit for an open-ended run
  that stops only on an operator interrupt.

## Step 0 — Read the taxonomy, compute the plan

```bash
dos doctor --workspace . --json
dos loop --workspace . --target N --json
```

The doctor report carries the active lane roster (the concurrent + exclusive
lanes and their trees) — **read it, never assume a lane name.** `dos loop` then
gathers the evidence (the live lane leases from the journal + each lease's
liveness) and returns the typed plan: a `verdict` (AT_TARGET / FILLING /
OVER_TARGET / TARGET_UNREACHABLE), the `alive`/`admissible`/`target` tally, and
the `spawn`/`reap`/`flag` lane lists. **This plan is the kernel's decision; the
remaining steps only enact it.** If the verdict is TARGET_UNREACHABLE, read its
reason (the roster cannot reach the number) and stop — see the anti-patterns.

## Step 1 — Launch one worker per SPAWN

For each lane in the plan's `spawn` list, launch a worker dispatch-loop focused
on that lane:

```
/dos-dispatch-loop --lane <LANE>
```

The worker takes its own lane lease via `dos arbitrate` at its Step 0 and
**journals its ACQUIRE early** — that early write is what shrinks the
double-spawn window: by the next tick the lease is visible in the journal, so
the supervisor counts it alive and does not launch a second worker on the same
lane. The supervisor itself never takes a lease; it only counts them and fills
the gap. Launch exactly the lanes the plan named — no more, no fewer.

## Step 2 — Per REAP consult resume then scavenge, per FLAG surface

For each lane in the plan's `reap` list (a STALLED worker): **before scavenging,
ask `dos resume` whether the run can be continued** (docs/107, issue #19):

```bash
dos resume --workspace . --run-id <run_id> --json
```

The verdict shapes the action:

- **RESUMABLE** — the kernel minted a re-entry point and recorded a
  `RESUME_PROPOSED` on the ledger. **Do NOT scavenge**; the proposal surfaces
  automatically in `dos decisions` for the operator to re-dispatch. The lane
  remains held until the operator acts.
- **DIVERGED** — ground truth advanced past the resume point; re-dispatch would
  overwrite fresh work. **Do NOT scavenge**; surface the decision to the operator
  the same way (the RESUME_PROPOSAL row in `dos decisions` carries the DIVERGED
  context). A human must decide.
- **UNRESUMABLE** or **COMPLETE** — no viable continuation (no intent, corrupt
  ledger, or all steps already verified). **Scavenge** as before: release the
  lease so the lane is free to refill on the next tick.

`dos resume` is **inspect-only for DIVERGED** (it never records a proposal for a
diverged run); the decision surfaces because the supervisor logs the verdict as
context when it leaves the lease in place. For COMPLETE/UNRESUMABLE the
scavenge path is unchanged — only the RESUMABLE/DIVERGED branch is new.

For each lane in the plan's `flag` list (a SPINNING worker, or an excess over
target): **surface it to the operator and move on.** Do NOT kill a SPINNING
worker and do NOT reap an excess healthy one — a flag is advisory. Acting on a
spin (deciding a busy-but-not-advancing worker should be stopped) is an open
question the supervisor deliberately leaves to a human; its job is to make the
spin visible, not to adjudicate it.

## Step 3 — Sleep, re-tick, stop on interrupt or --max-ticks

Sleep `--interval` seconds, then re-run from Step 0: re-read the taxonomy,
recompute the plan, enact it. Each tick is independent an
issue-verifySkill

Adjudicate a GitHub issue's "this is resolved" claim from witnesses the claimant didn't author — then close it carrying the evidence, or refuse with the typed gap. Use when an issue looks already-solved, after landing a fix that should have closed one, or to sweep open issues for silently-resolved ones.

issue-workSkill

Pick the next most important open GitHub issue this agent can actually complete, make its done-condition true, land it with witnesses (suite + parity + commit-audit), and priority-tag every issue touched along the way. Use when asked to "work the backlog", "complete the next most important issue", or to fix a specific issue number end-to-end.

releaseSkill

Cut a versioned release of the DOS kernel — bump the version, draft release notes, commit, tag, push to master, and create a GitHub release. The tag push triggers the gated PyPI publish pipeline (publish.yml); the skill surfaces the run and its approval gate.

stable-releaseSkill

Promote an already-shipped rolling release (vX.Y.Z) of the DOS kernel to a named stable channel — gated on a green kernel suite + a green third-party CI run on the candidate + a clean truth syscall + a soak window. Writes an evidence file and adds a stable/<codename> git tag on the same commit. Does NOT bump versions or build new artifacts.

dos-class-cycleSkill

One automatic plan-class lifecycle tick. Reads the DECLARED class set + transition list from the workspace `[lifecycle]` table (not a hardcoded taxonomy), evaluates each trigger, spawns a read-only JUDGE-rung adjudicator (the `dos.judges` seam — advisory, fail-to-abstain) to approve/defer each candidate transition, applies the gated transitions as plan-meta edits + one commit per cycle, and logs to the run archive. Failsafes (per-cycle cap, per-plan cooldown, a veto class) are `[lifecycle]` data; the judge content is a host `dos.judges` driver. Every path/class comes from `dos doctor --json`. Use to garden a plan portfolio's lifecycle automatically, judge-gated. The DOS lifecycle gardener (SKP Axis 5, docs/207 Phase 5c).

dos-dispatch-loopSkill

Run /dos-dispatch on a recurring cadence, alternating with /dos-replan when the backlog drains — the dispatch→replan→dispatch cycle. The continue/stop/next-mode decision is the kernel's typed loop decision, not inline prose: each iteration is classified (`dos gate`) into a verdict and the loop's counters (drained-twice, the unclear/dirty-zero breakers, the iteration cap) drive the next step. Several loops on disjoint lanes run concurrently, each taking its own lane lease via `dos arbitrate`. Driven entirely by `dos` verbs + the workspace's `dos.toml`. The DOS reference loop workflow (SKP Axis 5).

dos-dispatchSkill

End-to-end plan-and-ship for one lane — snapshot the portfolio with /dos-next-up, take a lane lease via `dos arbitrate` so parallel dispatches don't collide, gate the empty case via `dos gate`, ship the packet, and archive the run under the configured run dir. Driven entirely by `dos` verbs + the workspace's `dos.toml`; names no host path, lane, or commit convention. Use when you want to plan and ship the next batch on one lane in a single command, with concurrency safety. The DOS reference dispatch workflow (SKP Axis 5).

dos-goal-gateSkill

Ground a "keep working until the goal is met" stop condition in a witness the agent did not author, instead of letting the agent self-certify "done". A harness goal/Stop-hook condition is normally checked by the model re-reading its OWN work — consistency, not grounding. This skill turns the operator's goal into checkable EFFECT claims and wires `dos hook stop` so the Stop is refused until git ancestry (a shipped phase) or an effect read-back corroborates the claimed effect. Driven by `dos` verbs and the workspace's own `dos.toml` — no host-specific paths, lanes, or commit conventions. Use when you want a self-stopping agent (or a `/loop` worker) to be unable to declare a goal complete on its own say-so. The single-agent self-stop analogue of `dos-witness-claim`.