Skill74 repo starsupdated 6d ago

verify-llm-artifacts

verify-llm-artifacts performs a second-pass verification of code artifact findings from review-llm-artifacts, prioritizing precision to prevent accidental deletion or refactoring of code still in use. Use this skill after running review-llm-artifacts, particularly for full-project scans or findings involving deletions and dead code, and before executing fix-llm-artifacts to reduce false positives and validate that flagged artifacts genuinely require removal.

View source Repository: beagle

Install in Claude Code

Copy

git clone --depth 1 https://github.com/existential-birds/beagle /tmp/verify-llm-artifacts && cp -r /tmp/verify-llm-artifacts/plugins/beagle-core/skills/verify-llm-artifacts ~/.claude/skills/verify-llm-artifacts

Then start a new Claude Code session; the skill loads automatically.

Definition

SKILL.md

# Verify LLM Artifacts Findings

Second-pass verification for `.beagle/llm-artifacts-review.json`. The detection pass optimizes for recall; this pass optimizes for **precision** so agents do not remove or “clean” code that is still required.

## When to run

- After the [review-llm-artifacts](../review-llm-artifacts/SKILL.md) skill (especially full-project scans).
- Before the [fix-llm-artifacts](../fix-llm-artifacts/SKILL.md) skill when findings include **deletions**, **dead code**, or **High** risk.
- Whenever past runs flagged artifacts that should not have been removed.

## Inputs

- **Required:** `.beagle/llm-artifacts-review.json` from a completed review.
- **Optional:** `$ARGUMENTS` — `--priority-only` (verify `dead_code` and any `fix_action` of `delete` first; then others), `--id N` (single finding id).

If the review file is missing, exit with: `Run the review-llm-artifacts skill first.`

## Prerequisite skills

1. Load the [review-verification-protocol](../review-verification-protocol/SKILL.md) skill — general anti–false-positive discipline, including the **Anti-confabulation gate** (echo the artifact from a freshly read source before any verdict). The Load + ECHO gate in step 1 is this skill's concrete instance of that rule.
2. Load the [llm-artifacts-detection](../llm-artifacts-detection/SKILL.md) skill — category criteria for what counts as a real issue.

## Instructions

### Hard gates

Objective pass conditions before you claim verification is done:

1. **Input parse:** The JSON load command in step 1 exits 0 (no traceback). **Pass:** valid JSON on disk at `.beagle/llm-artifacts-review.json`.
2. **Echo before adjudicate:** Step 1 has printed the full finding table (one row per `findings[]` entry: id, file, line, category, description) sourced from the parsed JSON in **this** turn. **Pass:** the table exists in your output and its row count equals `len(findings)` — you have not begun any verdict before it.
3. **ID lock:** Step 1 has recorded the exact id set from `findings[]` and stated it explicitly. Every `results` entry maps 1:1 to a locked id — none added, none dropped. **Pass:** the locked id list is printed; if at any point an apparent finding has no matching locked id, you STOP (see step 1, ID lock).
4. **Evidence before verdict:** For each finding you adjudicate, you have applied [references/verification-checklist.md](references/verification-checklist.md) for its `category` (or documented why the category is N/A) and recorded matching strings in `checks_performed`. **Pass:** no `status` without at least one checklist-backed check or an explicit N/A note in `notes`.
5. **Output contract:** After writing `.beagle/llm-artifacts-verification.json`, the validate command in step 4 exits 0; `summary` counts equal the number of `results` entries by `status`; the `results` id set equals the **locked id set** from gate 3 exactly. **Pass:** schema-valid JSON and `results` ids == locked ids == source `findings[]` ids.

### 1. Load, ECHO, and lock ids

This is a two-part gate. **Parsing is not loading** — a `json.load` that exits 0 only proves the file is well-formed, not that you have the findings in context. You must echo the actual content before any adjudication.

**1a. Parse and echo the finding table.**

Print every finding from the parsed `findings[]` array — not from memory, not from the branch name, not from surrounding files:

```bash
python3 - <<'PY'
import json
r = json.load(open('.beagle/llm-artifacts-review.json'))
f = r['findings']
print(f"git_head={r.get('git_head')} scope={r.get('scope')} count={len(f)}")
print("| id | file | line | category | description |")
print("|----|------|------|----------|-------------|")
for x in f:
    desc = (x.get('description') or '').replace('|', '\\|')[:80]
    print(f"| {x['id']} | {x.get('file')} | {x.get('line')} | {x.get('category')} | {desc} |")
print("ids=" + ",".join(str(x['id']) for x in f))
PY
```

**Pass:** the command exits 0 and the table (one row per finding) appears in your output.

> **The only source of findings is the parsed `findings[]` array.** Do not infer findings from the branch name, the working directory, or surrounding files. If your mental model of the findings differs from the echoed table, **the table wins** — discard the mental model and adjudicate only the rows above.

**1b. ID lock (hard gate, before any adjudication).**

Record the exact set of ids from the `ids=` line above and state it now, e.g. `Locked ids: {1, 2, 3, 4, 5, 6, 7}`. This is the **locked id set**. Every result you write in step 4 must map 1:1 to this set: no id added, none dropped. The output id-check in step 4 references this locked set, not a re-derived one.

If, while verifying, you find yourself about to adjudicate a finding whose id is **not** in the locked set — or about to write a result for a file that does not appear in any locked row — **STOP**. That is an agent error (you are reasoning from memory or context, not the report). Re-read `findings[]` via the echo command above and restart adjudication. Do **not** record such a finding as `false_positive` (see step 3, Status discipline).

Record `git_head` and `scope` from the report (already printed by 1a). If the working tree no longer matches (optional strict mode: compare to `git rev-parse HEAD`), warn that line numbers may drift.

### 2. Order findings

Default order:

1. `category == "dead_code"` or `fix_action == "delete"` or `risk == "High"`
2. Remaining findings by `(risk descending, id ascending)`

With `--priority-only`, stop after processing category `dead_code` and all `fix_action: delete` (still write full output for those processed).

### 3. Verify each finding

For each finding, follow [references/verification-checklist.md](references/verification-checklist.md). Its **first** check for every category is the existence precondition: confirm the cited `file` exists at `source_git_head` before running any symbol/usage check.

**Minimum evidence per finding:**

- *