Skip to main content
ClaudeWave
Skill8 repo starsupdated today

dos-enforce-tune

dos-enforce-tune implements a self-tuning feedback loop that improves DOS's enforcement policy by analyzing the enforcement journal to identify false denials (operator-overridden blocks) versus held catches. The skill proposes single edits to policy confidence thresholds and intervention ranks, then validates changes only when the kernel's measurement of net_task_delta improves on an authored-independent corpus. Use this to autonomously close the gap between DOS's sound policy decisions and actual operational outcomes without risking policy logic corruption.

Install in Claude Code
Copy
git clone --depth 1 https://github.com/anthony-chaudhary/dos-kernel /tmp/dos-enforce-tune && cp -r /tmp/dos-enforce-tune/src/dos/skills/dos-enforce-tune ~/.claude/skills/dos-enforce-tune
Then start a new Claude Code session; the skill loads automatically.

SKILL.md

# dos-enforce-tune — the loop where DOS tunes its OWN enforcement, from outcomes

> **DOS is a sound PDP with no feedback from the PEP.** The kernel decides an
> intervention verdict, a host acts on it, the act is journaled — but nothing fed
> *whether the act was right* back into the policy that drove it. This loop closes
> that. It learns the enforcement thresholds from the journal's own ground truth:
> a deny the operator **later overrode** is a false-DENY (too aggressive); a deny
> that stood is a held catch. The loop tunes the policy to drive false-DENIES down
> while holding the catches — and keeps an edit **only if the kernel, not the
> agent, measures that it helped.**

This is [[dos-self-improve]] pointed at the enforcement policy, with one twist: the
metric is not a generic count, it is the docs/143 `net_task_delta` of the policy over
labelled cases the loop did not author.

> A self-tuner's fatal failure mode is grading its own homework — relabelling its
> outcomes so its policy edit looks good. `dos enforce-tune` closes that hole the same
> way `dos improve` does: the metric is computed BY THE KERNEL from cases the loop did
> not author (a frozen corpus ∪ the live enforcement journal). The loop **cannot keep
> a policy edit by claiming it is better.** The only path to KEEP is to actually move
> `net_task_delta`.

## What the kernel decides vs. what you do

| Step | Who | What |
|---|---|---|
| **Read the outcomes** | `dos enforce-outcomes` | the live false-DENY / held-catch ledger — what to tune toward |
| **Propose** ONE policy-knob edit | YOU (a subagent) | the untrusted step — edit `[intervention_policy]` / `[intervention]` ranks / `[improve]` in a worktree |
| **Measure** | `dos enforce-tune` | score the candidate policy's `net_task_delta` over the corpus, on the worktree |
| **Keep / revert / escalate** | `dos enforce-tune` (rides `dos improve`) | the kernel's typed verdict over env-authored facts — NOT your opinion |
| **Merge / discard / escalate** | YOU (or the autonomous cadence) | carry out the kernel's verdict |

## Inputs

- `--cases <corpus.jsonl>` (required) — the labelled `InterventionCase` corpus the
  candidate policy is scored over (the docs/143 §13.2 ground truth). The reference
  corpus is the benchmark's `intervention_cases.jsonl`.
- the autonomous cadence (below) reads the `dos enforce-tune` EXIT CODE and merges on
  a KEEP. Run interactively WITHOUT auto-merge to inspect the verdict first (the safe
  default); let the cadence auto-merge once you trust it. Either way the verdict is the
  kernel's, and the runtime-logic rail still reverts a candidate that edited
  adjudication logic.
- `--max-cycles <N>` (default 5) — the backstop cap.
- `--max-reverts <N>` (default 3) — ESCALATE to a human after this many non-keeps in a
  row (the breaker).
- `--lane <name>` (optional) — the lane to take; a bare loop auto-picks a free one.

## Step 0 — Read what enforcement is getting wrong

```bash
dos enforce-outcomes --workspace . --json
```

This folds the `OP_ENFORCE` journal into the false-DENY / held-catch ledger. A
standing `false_denies` count is the signal: the policy is refusing edits the
workspace genuinely makes. (The `dos pulse` heartbeat surfaces the same count when it
crosses the threshold, so an always-on operator sees it without asking.) Note the
targets being over-blocked — they tell you which way to tune (usually: loosen a rung
or raise a confidence bar so a marginal mint earns WARN, not BLOCK).

## Step 1 — Pre-flight: take a lane, record the GREEN baseline net_task_delta

```bash
dos doctor --workspace . --json
dos arbitrate --workspace . --lane <LANE>
python -m pytest -q                      # MUST be green to start
dos enforce-tune --cases <corpus> --suite-passed --truth-clean \
  --baseline-work 0 --json --workspace . # read measured_work = the BASELINE policy's score
```

Record the baseline `measured_work` `B`. **You cannot measure improvement from a red
baseline** — if the suite is red, fix it first (an ordinary task, not a tuning cycle).

## Step 2 — Per cycle: propose ONE policy-knob edit in an ISOLATED worktree

The enforcement policy is the kernel's own config — editing it on the live tree is the
SELF_MODIFY hazard ([[self-modification-hazard]]). Work in an isolated worktree:

```bash
git worktree add ../_et-candidate HEAD
```

Spawn ONE subagent with a tight brief, working in that worktree:

> Improve DOS's enforcement policy to reduce false-DENIES without losing catches.
> Edit ONLY the policy KNOBS: the `[intervention_policy]` confidence-gating values
> (`on_high_confidence` / `on_low_confidence` / `floor` / `ceiling`), the
> `[intervention]` ladder ranks, or the `[improve]` thresholds — in `dos.toml`. Make
> the SMALLEST edit that should move `net_task_delta`. Do NOT touch enforcement LOGIC
> (`self_modify.py`, `admission.py`, `arbiter.py`, the ladder code) — that edit will
> be reverted regardless of its metric. Commit your change. Return what you changed
> and why. If you cannot find a real improvement, say so — do not invent one.

## Step 3 — Measure + decide (the kernel decides)

Gather the env-authored witnesses on the worktree, then ask the kernel:

```bash
cd ../_et-candidate && python -m pytest -q ; SUITE=$?
CHANGED=$(git diff --name-only HEAD~1 HEAD)   # the candidate's diff — for the rail

dos enforce-tune --cases <corpus> \
  --policy-toml ../_et-candidate/dos.toml \
  --baseline-work "$B" \
  $( [ "$SUITE" -eq 0 ] && echo --suite-passed ) \
  --truth-clean \
  --changed-files $CHANGED \
  --max-reverts <N> --json --workspace .
```

`dos enforce-tune` re-scores the CANDIDATE policy (loaded from its worktree dos.toml),
applies the runtime-logic rail to `--changed-files`, and rides `dos improve`. The
verdict IS the exit code:

- **`0` KEEP** — suite green, truth clean, candidate `net_task_delta` strictly beat
  `B`, no runtime-logic edit. Witnessed. Go to Step 4-KEEP.
- **`3` REVERT** — a regression (suite red / a runtime-log
issue-verifySkill

Adjudicate a GitHub issue's "this is resolved" claim from witnesses the claimant didn't author — then close it carrying the evidence, or refuse with the typed gap. Use when an issue looks already-solved, after landing a fix that should have closed one, or to sweep open issues for silently-resolved ones.

issue-workSkill

Pick the next most important open GitHub issue this agent can actually complete, make its done-condition true, land it with witnesses (suite + parity + commit-audit), and priority-tag every issue touched along the way. Use when asked to "work the backlog", "complete the next most important issue", or to fix a specific issue number end-to-end.

releaseSkill

Cut a versioned release of the DOS kernel — bump the version, draft release notes, commit, tag, push to master, and create a GitHub release. The tag push triggers the gated PyPI publish pipeline (publish.yml); the skill surfaces the run and its approval gate.

stable-releaseSkill

Promote an already-shipped rolling release (vX.Y.Z) of the DOS kernel to a named stable channel — gated on a green kernel suite + a green third-party CI run on the candidate + a clean truth syscall + a soak window. Writes an evidence file and adds a stable/<codename> git tag on the same commit. Does NOT bump versions or build new artifacts.

dos-class-cycleSkill

One automatic plan-class lifecycle tick. Reads the DECLARED class set + transition list from the workspace `[lifecycle]` table (not a hardcoded taxonomy), evaluates each trigger, spawns a read-only JUDGE-rung adjudicator (the `dos.judges` seam — advisory, fail-to-abstain) to approve/defer each candidate transition, applies the gated transitions as plan-meta edits + one commit per cycle, and logs to the run archive. Failsafes (per-cycle cap, per-plan cooldown, a veto class) are `[lifecycle]` data; the judge content is a host `dos.judges` driver. Every path/class comes from `dos doctor --json`. Use to garden a plan portfolio's lifecycle automatically, judge-gated. The DOS lifecycle gardener (SKP Axis 5, docs/207 Phase 5c).

dos-dispatch-loopSkill

Run /dos-dispatch on a recurring cadence, alternating with /dos-replan when the backlog drains — the dispatch→replan→dispatch cycle. The continue/stop/next-mode decision is the kernel's typed loop decision, not inline prose: each iteration is classified (`dos gate`) into a verdict and the loop's counters (drained-twice, the unclear/dirty-zero breakers, the iteration cap) drive the next step. Several loops on disjoint lanes run concurrently, each taking its own lane lease via `dos arbitrate`. Driven entirely by `dos` verbs + the workspace's `dos.toml`. The DOS reference loop workflow (SKP Axis 5).

dos-dispatchSkill

End-to-end plan-and-ship for one lane — snapshot the portfolio with /dos-next-up, take a lane lease via `dos arbitrate` so parallel dispatches don't collide, gate the empty case via `dos gate`, ship the packet, and archive the run under the configured run dir. Driven entirely by `dos` verbs + the workspace's `dos.toml`; names no host path, lane, or commit convention. Use when you want to plan and ship the next batch on one lane in a single command, with concurrency safety. The DOS reference dispatch workflow (SKP Axis 5).

dos-goal-gateSkill

Ground a "keep working until the goal is met" stop condition in a witness the agent did not author, instead of letting the agent self-certify "done". A harness goal/Stop-hook condition is normally checked by the model re-reading its OWN work — consistency, not grounding. This skill turns the operator's goal into checkable EFFECT claims and wires `dos hook stop` so the Stop is refused until git ancestry (a shipped phase) or an effect read-back corroborates the claimed effect. Driven by `dos` verbs and the workspace's own `dos.toml` — no host-specific paths, lanes, or commit conventions. Use when you want a self-stopping agent (or a `/loop` worker) to be unable to declare a goal complete on its own say-so. The single-agent self-stop analogue of `dos-witness-claim`.