Skill2.9k repo starsupdated yesterday
docs-corpus-audit
docs-corpus-audit performs a whole-corpus grounding pass that identifies and fixes accumulated documentation drift between releases, such as stale flag names, broken navigation links, expired deprecation banners, and outdated factual claims. Use this skill when a maintainer requests a comprehensive documentation audit or pre-release sweep across the entire corpus, as opposed to docs-sync which validates individual PRs at submission time.
Install in Claude Code
Copygit clone --depth 1 https://github.com/microsoft/apm /tmp/docs-corpus-audit && cp -r /tmp/docs-corpus-audit/.apm/skills/docs-corpus-audit ~/.claude/skills/docs-corpus-auditThen start a new Claude Code session; the skill loads automatically.
Definition
SKILL.md
# docs-corpus-audit -- whole-corpus regrounding pass
The docs corpus drifts silently between releases. `docs-sync` catches
drift introduced by individual PRs at PR-open time. This skill catches
the **accumulated** drift that slips past per-PR review -- stale flag
names, dead nav links from past IA reshuffles, deprecation banners
that outlived their version targets, factual claims whose source-side
truth has moved.
The pattern is **A1 PANEL + WAVE EXECUTION + S7 DETERMINISTIC TOOL
BRIDGE + A8 ALIGNMENT LOOP + A9 SUPERVISED EXECUTION**. The corpus is
split into disjoint page scopes; one verifier subagent owns each
scope; agents extract factual claims, S7-verify against source, apply
surgical fixes inline. The orchestrator then runs an alignment-loop
pass to re-verify that applied edits actually ground out true.
This skill is ADVISORY but ACTIONABLE: agents apply edits inline on a
working branch. The orchestrator is the sole writer to git -- stages,
commits, pushes. Maintainer reviews the resulting PR.
## Sibling contract with docs-sync
These two skills share substrate. Be explicit:
| Shared resource | Owner | Both use |
|---|---|---|
| `.apm/docs-index.yml` (corpus map) | docs-sync | yes |
| [doc-writer](../../agents/doc-writer.agent.md) persona | shared | yes (per-page edits) |
| [python-architect](../../agents/python-architect.agent.md) persona | shared | yes (S7 verification) |
| [editorial-owner](../../agents/editorial-owner.agent.md) persona | shared | optional (voice pass at scale) |
| [cdo](../../agents/cdo.agent.md) persona | shared | yes (final synthesis) |
| `assets/panelist-return-schema.json` | docs-sync (mirrored) | yes |
**Trigger boundary (avoid DISPATCH COLLISION):**
- `docs-sync` triggers on a PR event ("PR opened/synchronized",
source-diff-driven).
- `docs-corpus-audit` triggers on a maintainer ask for a
WHOLE-CORPUS pass ("audit the corpus", "reground", "pre-release
sweep") -- no PR required, no diff required, the whole corpus
is the input.
If a maintainer asks "review this PR's doc impact", route to
`docs-sync`. If they ask "audit all our docs" or "the docs feel
stale everywhere", route here.
## Architecture invariants
- **Wave-batched, not flat.** Pages are partitioned into 6-8 disjoint
scopes; each scope is one verifier subagent. Cost scales with
wave size, not corpus size. A wave of 6 agents on ~10 pages each
is the canonical shape.
- **Disjoint page ownership.** Each subagent has EDIT AUTHORITY on
its scope only. No two agents touch the same file -- guarantees
no merge conflicts during fan-in.
- **S7 verification is mandatory.** Every factual claim is verified
against deterministic source: `uv run apm <verb> --help` for CLI,
`grep -n src/apm_cli/` for symbols, `python -c "import ..."` for
module shape, file-existence checks for nav links. Never assert
from LLM recall.
- **Surgical edits only.** 1-3 line patches per drift, preserving
voice. Restructuring is deferred to the orchestrator post-pass,
never auto-applied by per-scope agents.
- **Single-writer interlock for git.** Subagents NEVER run
`git commit`, `git push`, or `gh pr <write>`. Orchestrator
commits per wave; pushes once per session.
- **Alignment loop (A8).** After waves return, orchestrator
re-greps the corpus for the patterns the agents claimed to fix.
Any residue triggers a targeted re-dispatch (max 2 redrafts) or
is escalated to maintainer.
## Roster (composition, not invention)
Reuse docs-sync's personas. Do NOT invent a one-off "grounding-
verifier" role; that's R3 EXTRACT in reverse.
| Role | Persona | Always active? |
|---|---|---|
| Per-scope verifier+editor | [python-architect](../../agents/python-architect.agent.md) (S7) and [doc-writer](../../agents/doc-writer.agent.md) (edits), bundled into one subagent prompt per scope | Yes -- one per page scope, parallel fan-out |
| Cross-corpus post-pass | orchestrator (deterministic greps via `scripts/scan-cross-corpus-drift.sh`) | Yes -- once after waves return |
| Alignment-loop checker | orchestrator (deterministic re-grep + targeted re-dispatch) | Yes -- once after post-pass |
| Voice pass (optional) | [editorial-owner](../../agents/editorial-owner.agent.md) | Only when >20 edits to keep tone coherent |
| Final synthesis | [cdo](../../agents/cdo.agent.md) | Once, for the PR summary comment |
The per-scope subagent prompt that composes `python-architect` +
`doc-writer` is in `assets/subagent-prompt-template.md` -- the
orchestrator substitutes scope + working dir + branch and dispatches
via the task tool.
## Process
```
1. PROBE (A9 SUPERVISED EXECUTION)
- Check working tree: docs/src/content/docs/ exists?
- Check working tree: packages/apm-guide/.apm/skills/apm-usage/
exists? (Rule-4 backfill target. If missing, the audit cannot
close Rule 4; ask maintainer before continuing.)
- Check `.apm/docs-index.yml` reachable.
- Verify on a working branch (not main).
2. RISK-TRIAGE (orchestrator, ~1 LLM call)
- Read .apm/docs-index.yml only (NOT the corpus body).
- Bucket pages by drift risk: HIGH (CLI ref, schemas, consumer
flows), MEDIUM (producer, enterprise policy), LOW (concepts,
contributing, troubleshooting, integrations).
- Decide wave order: HIGH first, MEDIUM next, LOW last.
3. WAVE-PLANNER (orchestrator, deterministic)
- Partition pages into 6-8 disjoint scopes per wave.
- Each agent gets ~9 pages, mixed surface types.
4. WAVE EXECUTION (parallel, one subagent per scope)
- Orchestrator dispatches one task per scope using the prompt
template in assets/subagent-prompt-template.md.
- Subagents read pages, extract claims, S7-verify, apply
surgical edits, return JSON per the docs-sync panelist
schema (mirrored at assets/panelist-return-schema.json).
- Validate every return against the schema; reject malformed
JSON.
5. CROSS-CORPUS POST-PASS (orchestrator, deterministic)
- Run scripts/scan-cross-corpus-drift.sh to grep