Skip to main content
ClaudeWave
Skill8 estrellas del repoactualizado today

dos-self-improve

dos-self-improve implements a self-improving work loop where the DOS kernel adjudicates code changes using external verification rather than self-grading. It proposes a change in isolation, verifies it against test suites and measured metrics on a clean worktree, and keeps the change only if an independent witness confirms improvement. The kernel's typed `improve` verdict gates acceptance; failed candidates are reverted, and consecutive rejections escalate to human review, preventing the fatal flaw of agents grading their own work.

Instalar en Claude Code
Copiar
git clone --depth 1 https://github.com/anthony-chaudhary/dos-kernel /tmp/dos-self-improve && cp -r /tmp/dos-self-improve/src/dos/skills/dos-self-improve ~/.claude/skills/dos-self-improve
Después abre una sesión nueva de Claude Code; el skill carga automáticamente.

SKILL.md

# dos-self-improve — the loop where DOS adjudicates its own improvement

> **The first self-improving work loop for DOS.** It proposes a change, checks it,
> measures it, and keeps it **only if a witness the change's author did not write
> confirms it improved** — the green suite on a clean worktree, the truth syscall,
> and a strictly-measured metric gain. Everything else is reverted. The keep
> decision is the kernel's `improve` verdict, not the loop's opinion of its own
> work. A run of candidates nothing accepts trips a breaker that hands the
> judgment back to a human.

This is the dispatch loop (`/dos-dispatch-loop`) **turned inward on a codebase's
own source**, with the one rule no prior auto-improver enforced:

> A self-improving loop's fatal failure mode is **grading its own homework** — the
> agent that wrote the change is the same one that says "yes, this is better," so it
> learns to *narrate* improvement instead of making it. `dos improve` closes that
> hole: the keep-bit is a pure function of facts the loop did not author, so the
> loop **cannot keep a change by claiming it is better.** The only path to KEEP is
> to actually move a metric the environment measures.

The recursive-self-improvement literature names verification as the gating
constraint ("requiring verification regimes enabling labs to confirm"; "human
judgment on which problems matter remains the bottleneck"). This loop *is* that
verification regime, with the human kept in the loop by construction — the breaker
escalates to a person exactly when the loop runs dry of witnessed improvements.

## What the kernel decides vs. what you do

| Step | Who | What |
|---|---|---|
| **Propose** one scoped change | YOU (a subagent) | the capable, *untrusted* step — the only place intelligence enters the loop |
| **Verify + measure** | `dos` verbs | run the suite, run the truth syscall, measure the metric — all on an isolated worktree |
| **Keep / revert / escalate** | `dos improve` | the kernel's typed verdict over the env-authored facts — NOT your opinion |
| **Merge / discard / escalate** | YOU | carry out the kernel's verdict |

The kernel contributes ZERO intelligence to the proposal — only the refusal to
keep an unwitnessed one. That asymmetry is the whole design.

## Inputs

- `--metric <name>` (required) — what "improvement" means for THIS workspace, as a
  non-negative integer the environment measures, higher = better. The reference
  metric is **`lint-clean` = a large constant minus the `dos lint` finding count**
  (driving dead policy to zero); other honest metrics: a passing-property-test
  count, a coverage percent, a negated wall-clock budget. The kernel does not know
  the unit — it only compares magnitudes (the `productivity`/`efficiency` work-unit
  split).
- `--max-cycles <N>` (default 5) — the backstop cap (the `ITERATION_CAP` analogue).
- `--max-reverts <N>` (default 3) — ESCALATE to a human after this many candidates
  in a row that nothing accepted (the breaker).
- `--lane <name>` (optional) — the lane to take; a bare loop auto-picks a free one.

## Step 0 — Pre-flight: take a lane, record the GREEN baseline

```bash
dos doctor --workspace . --json
dos arbitrate --workspace . --lane <LANE>
```

Then establish the baseline — and this gate is non-negotiable:

```bash
python -m pytest -q          # MUST be green to start
<measure the metric>         # e.g. dos lint --workspace . --json | (1000 - finding count)
```

**You cannot measure improvement from a red baseline.** If the suite is red, STOP
and surface it — fix the suite first (that itself is a separate, ordinary task,
not a self-improvement cycle). Record the baseline metric `B`.

## Step 1 — Per cycle: propose ONE candidate in an ISOLATED worktree

The candidate edit is the `SELF_MODIFY` / `global`-lane hazard
([[self-modification-hazard]]): editing the kernel's own running path is exactly
what the arbiter refuses, and what would let a candidate rewrite the kernel that is
adjudicating it. So work in an isolated git worktree, never the live tree:

```bash
git worktree add ../_si-candidate HEAD
```

Spawn ONE subagent with a tight brief, working in that worktree:

> Improve exactly ONE thing in this codebase that will move `<metric>`. Make the
> SMALLEST diff that moves it. Do not touch tests to make them pass; do not weaken
> an assertion. Commit your change with a clear subject. Return what you changed
> and why. **If you cannot find a real improvement, say so — do not invent one.**

One candidate per cycle keeps the witness attributable (the `commit-audit` "one
commit, one claim, one diff" discipline). If the subagent returns "nothing to
improve," the cycle is a **skip** — not a revert; move on.

## Step 2 — Gather the env-authored witnesses (on the worktree)

Every fact the kernel reads is measured by the ENVIRONMENT, never taken from the
subagent's word:

```bash
# (1) the suite, on the candidate-only worktree — the runner authors the bit
cd ../_si-candidate && python -m pytest -q ; SUITE=$?

# (2) the truth syscall — git ancestry, the oracle authors the bit
dos commit-audit --workspace ../_si-candidate HEAD      # claim vs its own diff
# (and, if the candidate claims a plan phase, dos verify it)

# (3) the metric, re-measured on the worktree AFTER the candidate
<measure the metric on ../_si-candidate>   # → W

# (4) the tokens the subagent spent (the provider usage record) → T
```

A missing witness is a FAILING witness — if you cannot run the suite or the truth
syscall, treat it as red/dirty (fail-safe). Never substitute the subagent's "tests
pass" claim for the runner's exit code.

## Step 3 — Ask the kernel: KEEP / REVERT / ESCALATE (the kernel decides)

This is the load-bearing step: **the decision is a kernel mechanism, not prose.**

```bash
dos improve --workspace . \
  $( [ "$SUITE" -eq 0 ] && echo --suite-passed ) \
  $( <truth clean> && echo --truth-clean ) \
  --work "$W" --baseline-work "$B" --tokens "$T" \
  --consecutive-reverts "
issue-verifySkill

Adjudicate a GitHub issue's "this is resolved" claim from witnesses the claimant didn't author — then close it carrying the evidence, or refuse with the typed gap. Use when an issue looks already-solved, after landing a fix that should have closed one, or to sweep open issues for silently-resolved ones.

issue-workSkill

Pick the next most important open GitHub issue this agent can actually complete, make its done-condition true, land it with witnesses (suite + parity + commit-audit), and priority-tag every issue touched along the way. Use when asked to "work the backlog", "complete the next most important issue", or to fix a specific issue number end-to-end.

releaseSkill

Cut a versioned release of the DOS kernel — bump the version, draft release notes, commit, tag, push to master, and create a GitHub release. The tag push triggers the gated PyPI publish pipeline (publish.yml); the skill surfaces the run and its approval gate.

stable-releaseSkill

Promote an already-shipped rolling release (vX.Y.Z) of the DOS kernel to a named stable channel — gated on a green kernel suite + a green third-party CI run on the candidate + a clean truth syscall + a soak window. Writes an evidence file and adds a stable/<codename> git tag on the same commit. Does NOT bump versions or build new artifacts.

dos-class-cycleSkill

One automatic plan-class lifecycle tick. Reads the DECLARED class set + transition list from the workspace `[lifecycle]` table (not a hardcoded taxonomy), evaluates each trigger, spawns a read-only JUDGE-rung adjudicator (the `dos.judges` seam — advisory, fail-to-abstain) to approve/defer each candidate transition, applies the gated transitions as plan-meta edits + one commit per cycle, and logs to the run archive. Failsafes (per-cycle cap, per-plan cooldown, a veto class) are `[lifecycle]` data; the judge content is a host `dos.judges` driver. Every path/class comes from `dos doctor --json`. Use to garden a plan portfolio's lifecycle automatically, judge-gated. The DOS lifecycle gardener (SKP Axis 5, docs/207 Phase 5c).

dos-dispatch-loopSkill

Run /dos-dispatch on a recurring cadence, alternating with /dos-replan when the backlog drains — the dispatch→replan→dispatch cycle. The continue/stop/next-mode decision is the kernel's typed loop decision, not inline prose: each iteration is classified (`dos gate`) into a verdict and the loop's counters (drained-twice, the unclear/dirty-zero breakers, the iteration cap) drive the next step. Several loops on disjoint lanes run concurrently, each taking its own lane lease via `dos arbitrate`. Driven entirely by `dos` verbs + the workspace's `dos.toml`. The DOS reference loop workflow (SKP Axis 5).

dos-dispatchSkill

End-to-end plan-and-ship for one lane — snapshot the portfolio with /dos-next-up, take a lane lease via `dos arbitrate` so parallel dispatches don't collide, gate the empty case via `dos gate`, ship the packet, and archive the run under the configured run dir. Driven entirely by `dos` verbs + the workspace's `dos.toml`; names no host path, lane, or commit convention. Use when you want to plan and ship the next batch on one lane in a single command, with concurrency safety. The DOS reference dispatch workflow (SKP Axis 5).

dos-goal-gateSkill

Ground a "keep working until the goal is met" stop condition in a witness the agent did not author, instead of letting the agent self-certify "done". A harness goal/Stop-hook condition is normally checked by the model re-reading its OWN work — consistency, not grounding. This skill turns the operator's goal into checkable EFFECT claims and wires `dos hook stop` so the Stop is refused until git ancestry (a shipped phase) or an effect read-back corroborates the claimed effect. Driven by `dos` verbs and the workspace's own `dos.toml` — no host-specific paths, lanes, or commit conventions. Use when you want a self-stopping agent (or a `/loop` worker) to be unable to declare a goal complete on its own say-so. The single-agent self-stop analogue of `dos-witness-claim`.