dos-supervise-loop
The dos-supervise-loop skill maintains a target population of worker dispatch-loops across a DOS workspace by periodically reading the lane roster, consulting the kernel for a spawn/reap/flag plan via `dos loop --json`, and executing that plan: launching new workers on free admissible lanes, scavenging only stalled leases, and surfacing spinning workers to operators rather than killing them. Use it to keep a fleet of workers alive at a specified target count without manual intervention.
git clone --depth 1 https://github.com/anthony-chaudhary/dos-kernel /tmp/dos-supervise-loop && cp -r /tmp/dos-supervise-loop/src/dos/skills/dos-supervise-loop ~/.claude/skills/dos-supervise-loopSKILL.md
# dos-supervise-loop — the generic worker-population supervisor > **The init/PID-1 of a dispatch fleet.** It keeps `--target` worker > dispatch-loops alive across the lane roster: each tick it counts live lane > leases, classifies each worker's liveness, and fills the roster up to target by > launching one `/dos-dispatch-loop` per free admissible lane. The *what to do* > is a typed kernel verdict — `dos loop` emits a spawn/reap/flag plan; this skill > only carries it out. It reaps a worker ONLY when the kernel says STALLED, never > a healthy one, and it FLAGs a SPINNING worker to the operator rather than > killing it (acting on a spin is not the supervisor's job). The supervisor's whole contract is one rule the kernel owns: **the population is filled from the plan, not from a guess.** A tick is `gather evidence → ask `dos loop` → carry out spawn/reap/flag`. The four dispositions are the kernel's: 1. **SPAWN** — a free, admissible lane below target: launch one worker on it. 2. **REAP** — a STALLED lease: scavenge it so the lane is free to refill. 3. **HOLD** — an ADVANCING (or alive-counted SPINNING) worker: leave it alone. 4. **FLAG** — a SPINNING worker (advisory) or an excess over target: surface it, do not kill it. ## Inputs - `--target <N>` (default 1) — the desired live-worker population. The kernel caps the achievable population at the *admissible* count (how many roster lanes can simultaneously hold a worker given their disjointness); a `--target` above that yields a TARGET_UNREACHABLE verdict naming the fix. - `--max-concurrency <N>` (optional, docs/283) — the **derived-claim concurrency budget**. On a DYNAMIC-CLAIM workspace (`concurrent = []`, where a lane is a HANDLE whose disjointness is enforced per-pick at acquire time, not by a fixed tree), the static admissible count is 1, so a `--target` above 1 is structurally unreachable. Declaring this budget lets the supervisor keep up to N workers alive on a fungible auto-pick handle WITHOUT pre-enumerating N disjoint trees — the arbiter still narrows each worker's per-pick claim at its Step 0. Declare ONE number, not N trees. Set it standing in `dos.toml [supervise] max_concurrency`, or pass `--max-concurrency` for a one-off run. Off by default (admissible stays the static disjoint-lane count — byte-for-byte today's). - `--interval <seconds>` (long default) — the wakeup cadence between ticks. A supervisor wakes rarely; it is a watchdog, not a busy-loop. - `--max-ticks <N>` (optional) — stop after N ticks. Omit for an open-ended run that stops only on an operator interrupt. ## Step 0 — Read the taxonomy, compute the plan ```bash dos doctor --workspace . --json dos loop --workspace . --target N --json ``` The doctor report carries the active lane roster (the concurrent + exclusive lanes and their trees) — **read it, never assume a lane name.** `dos loop` then gathers the evidence (the live lane leases from the journal + each lease's liveness) and returns the typed plan: a `verdict` (AT_TARGET / FILLING / OVER_TARGET / TARGET_UNREACHABLE), the `alive`/`admissible`/`target` tally, and the `spawn`/`reap`/`flag` lane lists. **This plan is the kernel's decision; the remaining steps only enact it.** If the verdict is TARGET_UNREACHABLE, read its reason (the roster cannot reach the number) and stop — see the anti-patterns. ## Step 1 — Launch one worker per SPAWN For each lane in the plan's `spawn` list, launch a worker dispatch-loop focused on that lane: ``` /dos-dispatch-loop --lane <LANE> ``` The worker takes its own lane lease via `dos arbitrate` at its Step 0 and **journals its ACQUIRE early** — that early write is what shrinks the double-spawn window: by the next tick the lease is visible in the journal, so the supervisor counts it alive and does not launch a second worker on the same lane. The supervisor itself never takes a lease; it only counts them and fills the gap. Launch exactly the lanes the plan named — no more, no fewer. ## Step 2 — Per REAP consult resume then scavenge, per FLAG surface For each lane in the plan's `reap` list (a STALLED worker): **before scavenging, ask `dos resume` whether the run can be continued** (docs/107, issue #19): ```bash dos resume --workspace . --run-id <run_id> --json ``` The verdict shapes the action: - **RESUMABLE** — the kernel minted a re-entry point and recorded a `RESUME_PROPOSED` on the ledger. **Do NOT scavenge**; the proposal surfaces automatically in `dos decisions` for the operator to re-dispatch. The lane remains held until the operator acts. - **DIVERGED** — ground truth advanced past the resume point; re-dispatch would overwrite fresh work. **Do NOT scavenge**; surface the decision to the operator the same way (the RESUME_PROPOSAL row in `dos decisions` carries the DIVERGED context). A human must decide. - **UNRESUMABLE** or **COMPLETE** — no viable continuation (no intent, corrupt ledger, or all steps already verified). **Scavenge** as before: release the lease so the lane is free to refill on the next tick. `dos resume` is **inspect-only for DIVERGED** (it never records a proposal for a diverged run); the decision surfaces because the supervisor logs the verdict as context when it leaves the lease in place. For COMPLETE/UNRESUMABLE the scavenge path is unchanged — only the RESUMABLE/DIVERGED branch is new. For each lane in the plan's `flag` list (a SPINNING worker, or an excess over target): **surface it to the operator and move on.** Do NOT kill a SPINNING worker and do NOT reap an excess healthy one — a flag is advisory. Acting on a spin (deciding a busy-but-not-advancing worker should be stopped) is an open question the supervisor deliberately leaves to a human; its job is to make the spin visible, not to adjudicate it. ## Step 3 — Sleep, re-tick, stop on interrupt or --max-ticks Sleep `--interval` seconds, then re-run from Step 0: re-read the taxonomy, recompute the plan, enact it. Each tick is independent an
Adjudicate a GitHub issue's "this is resolved" claim from witnesses the claimant didn't author — then close it carrying the evidence, or refuse with the typed gap. Use when an issue looks already-solved, after landing a fix that should have closed one, or to sweep open issues for silently-resolved ones.
Pick the next most important open GitHub issue this agent can actually complete, make its done-condition true, land it with witnesses (suite + parity + commit-audit), and priority-tag every issue touched along the way. Use when asked to "work the backlog", "complete the next most important issue", or to fix a specific issue number end-to-end.
Cut a versioned release of the DOS kernel — bump the version, draft release notes, commit, tag, push to master, and create a GitHub release. The tag push triggers the gated PyPI publish pipeline (publish.yml); the skill surfaces the run and its approval gate.
Promote an already-shipped rolling release (vX.Y.Z) of the DOS kernel to a named stable channel — gated on a green kernel suite + a green third-party CI run on the candidate + a clean truth syscall + a soak window. Writes an evidence file and adds a stable/<codename> git tag on the same commit. Does NOT bump versions or build new artifacts.
One automatic plan-class lifecycle tick. Reads the DECLARED class set + transition list from the workspace `[lifecycle]` table (not a hardcoded taxonomy), evaluates each trigger, spawns a read-only JUDGE-rung adjudicator (the `dos.judges` seam — advisory, fail-to-abstain) to approve/defer each candidate transition, applies the gated transitions as plan-meta edits + one commit per cycle, and logs to the run archive. Failsafes (per-cycle cap, per-plan cooldown, a veto class) are `[lifecycle]` data; the judge content is a host `dos.judges` driver. Every path/class comes from `dos doctor --json`. Use to garden a plan portfolio's lifecycle automatically, judge-gated. The DOS lifecycle gardener (SKP Axis 5, docs/207 Phase 5c).
Run /dos-dispatch on a recurring cadence, alternating with /dos-replan when the backlog drains — the dispatch→replan→dispatch cycle. The continue/stop/next-mode decision is the kernel's typed loop decision, not inline prose: each iteration is classified (`dos gate`) into a verdict and the loop's counters (drained-twice, the unclear/dirty-zero breakers, the iteration cap) drive the next step. Several loops on disjoint lanes run concurrently, each taking its own lane lease via `dos arbitrate`. Driven entirely by `dos` verbs + the workspace's `dos.toml`. The DOS reference loop workflow (SKP Axis 5).
End-to-end plan-and-ship for one lane — snapshot the portfolio with /dos-next-up, take a lane lease via `dos arbitrate` so parallel dispatches don't collide, gate the empty case via `dos gate`, ship the packet, and archive the run under the configured run dir. Driven entirely by `dos` verbs + the workspace's `dos.toml`; names no host path, lane, or commit convention. Use when you want to plan and ship the next batch on one lane in a single command, with concurrency safety. The DOS reference dispatch workflow (SKP Axis 5).
Ground a "keep working until the goal is met" stop condition in a witness the agent did not author, instead of letting the agent self-certify "done". A harness goal/Stop-hook condition is normally checked by the model re-reading its OWN work — consistency, not grounding. This skill turns the operator's goal into checkable EFFECT claims and wires `dos hook stop` so the Stop is refused until git ancestry (a shipped phase) or an effect read-back corroborates the claimed effect. Driven by `dos` verbs and the workspace's own `dos.toml` — no host-specific paths, lanes, or commit conventions. Use when you want a self-stopping agent (or a `/loop` worker) to be unable to declare a goal complete on its own say-so. The single-agent self-stop analogue of `dos-witness-claim`.