Skip to main content
ClaudeWave
Skill389 repo starsupdated today

skill-auditor

The skill-auditor validates SKILL.md files against the AgentOps template through three sequential passes: structural hygiene via heal-skill, eight content-discipline checks covering description triggers and intent clarity, and a deterministic 0-30 productization rubric score. Use it for quality gate reviews, template compliance verification, or to identify structural and content gaps before merging skill contributions.

Install in Claude Code
Copy
git clone --depth 1 https://github.com/boshu2/agentops /tmp/skill-auditor && cp -r /tmp/skill-auditor/images/gemini/skills/skill-auditor ~/.claude/skills/skill-auditor
Then start a new Claude Code session; the skill loads automatically.

SKILL.md

# /skill-auditor — Three-pass skill quality audit

Validates a skill's SKILL.md against the unified AgentOps template. Pass 1
wraps `heal-skill` for structural hygiene; Pass 2 adds 8 content-discipline
checks not covered by heal; Pass 3 folds the 10-category Skill Quality Rubric
(`docs/reference/skill-quality-rubric.md`) into the report as a deterministic
0-30 productization score (advisory). The report also includes an advisory
Context Density Rule block for intent, boundary, evidence, decision,
constraint, and next action coverage.

## ⚠️ Critical Constraints

- **Auditor is read-only.** Reports findings; never modifies the target. **Why:** PR-002 (external validation) — the auditor must remain a separate gate from the implementer. To repair findings: use `heal-skill --fix` for Pass-1 issues, hand-edit for Pass-2 issues.
- **Pass 1 delegates, never reimplements.** The auditor calls `heal-skill --check --strict <target>` and gates on its exit code; parsing output is report-only. **Why:** PR-006 (cross-layer consistency) — heal-skill's checks are the source of truth for structural hygiene; reimplementation creates drift.
- **Pass 2 must accept AgentOps' existing conventions.** Specifically `description-has-triggers` accepts THREE valid forms (YAML `|` block scalar OR `Triggers:`/`Use when:` markers OR `metadata.triggers` array with 3+ items). **Why:** finding `f-2026-05-06-auditor-checks-must-fit-host-conventions` — auditor checks must validate against the host substrate's existing valid artifacts before promotion to required gate.
- **Verdict aggregation rule:** any check returns `fail` → FAIL; otherwise any returns `warn` → WARN; otherwise PASS. **Why:** prevents silent severity downgrade.
- **Density coverage is advisory-only.** Missing density fields never changes
  the PASS/WARN/FAIL verdict and does not satisfy packet-boundary enforcement
  in `soc-2c1p.1`. **Why:** the hard Context Density Rule belongs at execution
  packet boundaries; this skill only helps reviewers find low-signal prose.
- **Pass 3 rubric is advisory-only.** The 0-30 rubric score never changes the
  PASS/WARN/FAIL verdict. **Why:** Pass 1+2 gate *template conformance* (does
  this ship); the rubric measures *market-facing maturity* (is this
  product-grade) — a low rubric score on a structurally-clean skill is a
  productization backlog signal, not a ship blocker (soc-ads5v).
- **Pass 3 scoring is deterministic and rubric-sourced.** The 10 categories
  come verbatim from `docs/reference/skill-quality-rubric.md`; each gets a 0-3
  score plus an explainable reason derived only from the skill directory
  contents. **Why:** an explainable, reproducible score is auditable; an
  LLM-graded one is not.

## What It Detects

### Pass 1 — delegated to heal-skill

| heal.sh code | Check |
|--------------|-------|
| MISSING_NAME | frontmatter `name` present |
| MISSING_DESC | frontmatter `description` present |
| NAME_MISMATCH | frontmatter name matches directory |
| UNLINKED_REF | `references/*.md` linked from SKILL.md |
| DEAD_REF | linked references actually exist |
| SCRIPT_REF_MISSING | scripts referenced exist |
| CATALOG_MISSING | user-invocable skills in `using-agentops/` catalog |

### Pass 2 — 8 NEW checks

| # | Check id | Severity |
|---|----------|----------|
| 1 | `description-has-triggers` | WARN (downgraded from FAIL after pilot) |
| 2 | `constraints-frontloaded` | WARN |
| 3 | `rationale-present` | WARN |
| 4 | `verification-checkpoints` | WARN |
| 5 | `output-spec-explicit` | FAIL |
| 6 | `quality-rubric` | WARN |
| 7 | `references-modularization` | WARN (conditional, only if SKILL.md > 400 lines) |
| 8 | `trigger-clarity` | WARN (downgraded from FAIL after pilot) |

Full check definitions and accepted forms in [references/audit-checks.md](references/audit-checks.md).

### Advisory density report

The JSON report includes a separate `density` block with six report-only fields:
`intent`, `boundary`, `evidence`, `decision`, `constraint`, and `next_action`.
Read [references/context-density-checks.md](references/context-density-checks.md)
for detection rules, limits, and false-positive handling.

### Pass 3 — rubric scoring (10 categories, advisory)

`audit.sh` runs `scripts/score_agentops_skill.py --audit-block` and folds the
result into `audit-report.json` under a `rubric` key. The 10 categories come
verbatim from [`docs/reference/skill-quality-rubric.md`](../../docs/reference/skill-quality-rubric.md)
(read it for the per-score `0/1/2/3` definitions): `trigger_quality`,
`kernel_clarity`, `progressive_disclosure`, `helper_scripts`, `validation`,
`self_test`, `assets_templates`, `subagents_roles`, `safety_boundaries`,
`packaging`.

Each category scores 0-3 (`0` missing/unsafe → `3` product-grade and
mechanically validated) with an explainable `reason`. Total 0-30 maps to a
rating band: `C` (0-10), `B` (11-20), `A` (21-26), `S` (27-30). The score is
**advisory** — it never changes the PASS/WARN/FAIL verdict.

Standalone (markdown) for picking the smallest productization patch:

```bash
python3 skills/skill-auditor/scripts/score_agentops_skill.py skills/<name> --markdown
```

Use it to pick the smallest patch (`SELF-TEST.md`, linked references, helper
scripts, assets, subagents, safety boundaries, or validation), then re-run this
auditor and `heal-skill`.

## Execution Steps

### Step 1: Pass 1 (heal-skill delegation)

```bash
bash skills/heal-skill/scripts/heal.sh --check --strict <target>
```

Gate on the exit code. Each stdout line `[CODE] <path>: <msg>` becomes one Pass-1 finding for the report.

### Step 2: Pass 2 (8 NEW checks)

For each `check_*` function in `scripts/audit.sh`, run against `<target>/SKILL.md`. Each emits `pass`, `warn`, or `fail` to stdout.

**Checkpoint:** Pass 2 must run independently of Pass 1 (no shared state); a heal strict failure does NOT short-circuit Pass 2, but it DOES force the aggregate verdict to FAIL.

### Step 3: Pass 3 (rubric scoring)

```bash
python3 scripts/score_agent