skill-auditor
The skill-auditor validates SKILL.md files against the AgentOps template through three sequential passes: structural hygiene via heal-skill, eight content-discipline checks covering description triggers and intent clarity, and a deterministic 0-30 productization rubric score. Use it for quality gate reviews, template compliance verification, or to identify structural and content gaps before merging skill contributions.
git clone --depth 1 https://github.com/boshu2/agentops /tmp/skill-auditor && cp -r /tmp/skill-auditor/images/gemini/skills/skill-auditor ~/.claude/skills/skill-auditorSKILL.md
# /skill-auditor — Three-pass skill quality audit Validates a skill's SKILL.md against the unified AgentOps template. Pass 1 wraps `heal-skill` for structural hygiene; Pass 2 adds 8 content-discipline checks not covered by heal; Pass 3 folds the 10-category Skill Quality Rubric (`docs/reference/skill-quality-rubric.md`) into the report as a deterministic 0-30 productization score (advisory). The report also includes an advisory Context Density Rule block for intent, boundary, evidence, decision, constraint, and next action coverage. ## ⚠️ Critical Constraints - **Auditor is read-only.** Reports findings; never modifies the target. **Why:** PR-002 (external validation) — the auditor must remain a separate gate from the implementer. To repair findings: use `heal-skill --fix` for Pass-1 issues, hand-edit for Pass-2 issues. - **Pass 1 delegates, never reimplements.** The auditor calls `heal-skill --check --strict <target>` and gates on its exit code; parsing output is report-only. **Why:** PR-006 (cross-layer consistency) — heal-skill's checks are the source of truth for structural hygiene; reimplementation creates drift. - **Pass 2 must accept AgentOps' existing conventions.** Specifically `description-has-triggers` accepts THREE valid forms (YAML `|` block scalar OR `Triggers:`/`Use when:` markers OR `metadata.triggers` array with 3+ items). **Why:** finding `f-2026-05-06-auditor-checks-must-fit-host-conventions` — auditor checks must validate against the host substrate's existing valid artifacts before promotion to required gate. - **Verdict aggregation rule:** any check returns `fail` → FAIL; otherwise any returns `warn` → WARN; otherwise PASS. **Why:** prevents silent severity downgrade. - **Density coverage is advisory-only.** Missing density fields never changes the PASS/WARN/FAIL verdict and does not satisfy packet-boundary enforcement in `soc-2c1p.1`. **Why:** the hard Context Density Rule belongs at execution packet boundaries; this skill only helps reviewers find low-signal prose. - **Pass 3 rubric is advisory-only.** The 0-30 rubric score never changes the PASS/WARN/FAIL verdict. **Why:** Pass 1+2 gate *template conformance* (does this ship); the rubric measures *market-facing maturity* (is this product-grade) — a low rubric score on a structurally-clean skill is a productization backlog signal, not a ship blocker (soc-ads5v). - **Pass 3 scoring is deterministic and rubric-sourced.** The 10 categories come verbatim from `docs/reference/skill-quality-rubric.md`; each gets a 0-3 score plus an explainable reason derived only from the skill directory contents. **Why:** an explainable, reproducible score is auditable; an LLM-graded one is not. ## What It Detects ### Pass 1 — delegated to heal-skill | heal.sh code | Check | |--------------|-------| | MISSING_NAME | frontmatter `name` present | | MISSING_DESC | frontmatter `description` present | | NAME_MISMATCH | frontmatter name matches directory | | UNLINKED_REF | `references/*.md` linked from SKILL.md | | DEAD_REF | linked references actually exist | | SCRIPT_REF_MISSING | scripts referenced exist | | CATALOG_MISSING | user-invocable skills in `using-agentops/` catalog | ### Pass 2 — 8 NEW checks | # | Check id | Severity | |---|----------|----------| | 1 | `description-has-triggers` | WARN (downgraded from FAIL after pilot) | | 2 | `constraints-frontloaded` | WARN | | 3 | `rationale-present` | WARN | | 4 | `verification-checkpoints` | WARN | | 5 | `output-spec-explicit` | FAIL | | 6 | `quality-rubric` | WARN | | 7 | `references-modularization` | WARN (conditional, only if SKILL.md > 400 lines) | | 8 | `trigger-clarity` | WARN (downgraded from FAIL after pilot) | Full check definitions and accepted forms in [references/audit-checks.md](references/audit-checks.md). ### Advisory density report The JSON report includes a separate `density` block with six report-only fields: `intent`, `boundary`, `evidence`, `decision`, `constraint`, and `next_action`. Read [references/context-density-checks.md](references/context-density-checks.md) for detection rules, limits, and false-positive handling. ### Pass 3 — rubric scoring (10 categories, advisory) `audit.sh` runs `scripts/score_agentops_skill.py --audit-block` and folds the result into `audit-report.json` under a `rubric` key. The 10 categories come verbatim from [`docs/reference/skill-quality-rubric.md`](../../docs/reference/skill-quality-rubric.md) (read it for the per-score `0/1/2/3` definitions): `trigger_quality`, `kernel_clarity`, `progressive_disclosure`, `helper_scripts`, `validation`, `self_test`, `assets_templates`, `subagents_roles`, `safety_boundaries`, `packaging`. Each category scores 0-3 (`0` missing/unsafe → `3` product-grade and mechanically validated) with an explainable `reason`. Total 0-30 maps to a rating band: `C` (0-10), `B` (11-20), `A` (21-26), `S` (27-30). The score is **advisory** — it never changes the PASS/WARN/FAIL verdict. Standalone (markdown) for picking the smallest productization patch: ```bash python3 skills/skill-auditor/scripts/score_agentops_skill.py skills/<name> --markdown ``` Use it to pick the smallest patch (`SELF-TEST.md`, linked references, helper scripts, assets, subagents, safety boundaries, or validation), then re-run this auditor and `heal-skill`. ## Execution Steps ### Step 1: Pass 1 (heal-skill delegation) ```bash bash skills/heal-skill/scripts/heal.sh --check --strict <target> ``` Gate on the exit code. Each stdout line `[CODE] <path>: <msg>` becomes one Pass-1 finding for the report. ### Step 2: Pass 2 (8 NEW checks) For each `check_*` function in `scripts/audit.sh`, run against `<target>/SKILL.md`. Each emits `pass`, `warn`, or `fail` to stdout. **Checkpoint:** Pass 2 must run independently of Pass 1 (no shared state); a heal strict failure does NOT short-circuit Pass 2, but it DOES force the aggregate verdict to FAIL. ### Step 3: Pass 3 (rubric scoring) ```bash python3 scripts/score_agent
Use Agent Mail from Codex for file leases, notifications, inboxes, and conflict prevention.
>-
>-
Use when converting markdown plans into br beads with dependencies for implementation or swarm execution.
Use when switching AI coding CLI accounts quickly to recover from subscription rate limits or OAuth friction.
>-
Use when starting non-trivial work, mining lessons, or preventing repeated mistakes with cm procedural memory.
Mine past agent sessions for working prompts, decisions, and patterns. Use when "what did I ask?", "find that prompt", session archaeology, or agent history.