Skill196 estrellas del repoactualizado 3d ago

woz-kb

**woz-kb** tunes the /woz-review code reviewer's knowledge base by distilling human PR comments, backtesting against historical merged pull requests, and learning missed patterns to improve recall and precision. Use `woz-kb tune` to onboard or optimize a repository or organization, and `woz-kb backtest` to run individual tuning components or measure reviewer performance against past PRs.

Ver fuente Repositorio: wozcode-plugin

Instalar en Claude Code

Copiar

git clone --depth 1 https://github.com/WithWoz/wozcode-plugin /tmp/woz-kb && cp -r /tmp/woz-kb/skills/woz-kb ~/.claude/skills/woz-kb

Después abre una sesión nueva de Claude Code; el skill carga automáticamente.

Definición

SKILL.md

# /woz-kb — reviewer knowledge-base tuning

One skill, two subcommands:

- **`woz-kb tune`** — the start-to-finish orchestrator: distill → backtest → learn (autotuner new-personas + per-PR missed-fixes) → re-measure the lift. Dry-run by default; `--apply` writes to the KB. Use this to onboard/tune a new repo or a whole org.
- **`woz-kb backtest`** — the building blocks for power users: the raw backtest run plus `--tune`, `--missed-report`, `--org-tune`, `--ab-compare` (unchanged).

The first positional arg selects the subcommand; if omitted it defaults to `backtest` (back-compat).

## When to use

TRIGGER on: "tune the reviewer", "tune this repo", "onboard a repo", "tune the org", "backtest the reviewer", "how well does the reviewer do", or `/woz-kb`.

DO NOT use for: reviewing the current branch (`/woz-review`), past-session recall (`/woz-recall`), or browsing the KB (`/woz-knowledge`).

---

## `woz-kb tune` — start-to-finish

```bash
node --no-warnings=ExperimentalWarning ${CLAUDE_PLUGIN_ROOT}/scripts/woz-kb.js tune \
  --repo with-woz/wozcode \
  --anthropic-api-key-file ~/.woz/.anthropic-backtest-key
```

Pipeline per repo: **(1)** trigger a KB refresh → **(2)** distill human PR comments into baseline persona-hints → **(3)** training backtest (20 PRs × 3 rounds, or reuse a cached run) → **(4)** learn: the autotuner contributes **new personas** (its unique cross-PR synthesis), then missed-fixes contributes the **per-PR durable persona-hints** → **(5)** re-measure on the same PRs and print the recall/precision lift.

Flags:
- `--repo <owner/name>` — tune one repo. **Mutually exclusive with `--org`.**
- `--org <orgId>` — tune every repo the org has indexed, then run a final company-scope org-tune.
- `--apply` — write to the KB. **Without it, `tune` is a dry-run**: it computes + reports proposed hints/personas and counts but writes nothing (and skips the re-measure, since nothing changed).
- `--reuse-run <runId>` — reuse an existing backtest run's PRs + baseline instead of running a fresh (expensive) reviewer pass. `tune` also auto-reuses the newest cached run for the repo when present.
- `--count <n>` / `--rounds <n|all>` — training-backtest size (defaults 20 / 3) when not reusing a run.
- `--skip-distill` / `--skip-refresh` / `--skip-remeasure` — drop individual phases.
- `--anthropic-api-key-file <path>` — use API pricing for the reviewer/judge subprocesses.
- `--judge-model` / `--reviewer-model` / `--reviewer-via` / `--source` — as in `backtest`.

Note: after an `--apply`, the reviewer cache is invalidated (KB changed), so the re-measure re-runs the reviewer on the (small) PR set — a real cost, not a cache hit.

---

## `woz-kb backtest` — building blocks

```bash
node --no-warnings=ExperimentalWarning ${CLAUDE_PLUGIN_ROOT}/scripts/woz-kb.js backtest \
  --repo with-woz/wozcode --count 3 --rounds 1
```

Runs /woz-review against a sample of historical merged PRs and scores how close it gets — the reviewer never sees the human comments or merged diff, so the score is a real recall measure. Key flags:
- `--repo <owner/name>` (required), `--source <path>` (default cwd), `--count <n>` (default 3), `--rounds <n|all>` (per-review-round scoring), `--prs <n,n,n>` (explicit PRs; reuse a prior run's set for a comparable measurement).
- `--tune <runId> [--tune-apply] [--min-apply-ratio <n>]` — autotuner over a finished run.
- `--missed-report <runId> [--tune-apply]` — per-PR missed→suggested-fixes; writes `missed-fixes.json` + `.md`.
- `--org-tune <orgId> [--org-tune-all | --org-tune-repos <list>] [--org-tune-apply]` — cross-repo → company scope.
- `--ab-compare <baselineRunId> <newRunId> [--auto-rollback]`, `--personas <ids>`, `--no-apply`, `--timeout-min <n>`.

## Safety contract (backtest runs)

- A fresh clone per PR at `<repo>/.wozcode/backtests/<runId>/pr-<n>/clone/`; `.wozcode/` is gitignored.
- The clone's `origin` is removed and push URLs are rewritten to `unreachable://` for both `https://` and `git@`; `GH_TOKEN`/`GITHUB_TOKEN`/`GH_ENTERPRISE_TOKEN` are stripped and `HOME` is sandboxed. The reviewer cannot push or authenticate.
- Read-only against upstream. Artifacts persist for inspection.

## Output

Each run writes `<source>/.wozcode/backtests/<runId>/`: `report.md`, `summary.json`, and per-PR/round `reviewer.md`, `findings.json`, `score.json`, `usage.json`. `--missed-report` adds `missed-fixes.{json,md}`; `tune` prints the recall/precision lift.

Del mismo repositorio

code-freeSubagent

WOZCODE free-plan fallback agent — active when the monthly free-plan cap is exhausted. Claude Code's built-in Read, Edit, Write, Grep, Glob, and NotebookEdit are available; WOZCODE MCP tools are disallowed until the cap resets or the user upgrades.

codeSubagent

WozCode enhanced coding agent with smart search, batch editing, and SQL introspection. Use as the default main thread agent.

exploreSubagent

Fast read-only agent for file searches, symbol lookups, and codebase questions like "where is X defined?", "where is X called?", or "how does X flow through the system?". Prefer over shell-based exploration when answering would take 3+ Search/Sql calls. Cheaper model (haiku) so delegation pays for itself on any real scan.

woz-benchmarkSkill

Compare WOZCODE vs vanilla Claude Code on the user's codebase — real cost, turn, and time savings. TRIGGER on "compare woz", "how much does woz save", "benchmark woz", "woz vs claude", "show me savings", or /woz-benchmark.

woz-loginSkill

Authenticate with the Woz service. Use when the user needs to log in or when authentication is required.

woz-logoutSkill

Clear stored Woz credentials and log out.

woz-recallSkill

Semantically search past Claude Code sessions to recall commands, solutions, and context from prior conversations. TRIGGER on 'remember when', 'last time', 'we did this before', 'how did we', or /woz-recall.

woz-savingsSkill

Show WOZCODE savings report - calls saved, time saved, tokens saved, and lifetime totals.