verify-llm-artifacts
Confirms or rejects findings from review-llm-artifacts before deletes or risky refactors. Loads review-verification-protocol-style checks per finding. Use after a review run, when the user wants to reduce false positives, before fix-llm-artifacts on dead code, or when validating a full-project scan.
git clone --depth 1 https://github.com/existential-birds/beagle /tmp/verify-llm-artifacts && cp -r /tmp/verify-llm-artifacts/plugins/beagle-core/skills/verify-llm-artifacts ~/.claude/skills/verify-llm-artifactsSKILL.md
# Verify LLM Artifacts Findings
Second-pass verification for `.beagle/llm-artifacts-review.json`. The detection pass optimizes for recall; this pass optimizes for **precision** so agents do not remove or “clean” code that is still required.
## When to run
- After the [review-llm-artifacts](../review-llm-artifacts/SKILL.md) skill (especially full-project scans).
- Before the [fix-llm-artifacts](../fix-llm-artifacts/SKILL.md) skill when findings include **deletions**, **dead code**, or **High** risk.
- Whenever past runs flagged artifacts that should not have been removed.
## Inputs
- **Required:** `.beagle/llm-artifacts-review.json` from a completed review.
- **Optional:** `$ARGUMENTS` — `--priority-only` (verify `dead_code` and any `fix_action` of `delete` first; then others), `--id N` (single finding id).
If the review file is missing, exit with: `Run the review-llm-artifacts skill first.`
## Prerequisite skills
1. Load the [review-verification-protocol](../review-verification-protocol/SKILL.md) skill — general anti–false-positive discipline, including the **Anti-confabulation gate** (echo the artifact from a freshly read source before any verdict). The Load + ECHO gate in step 1 is this skill's concrete instance of that rule.
2. Load the [llm-artifacts-detection](../llm-artifacts-detection/SKILL.md) skill — category criteria for what counts as a real issue.
## Instructions
### Hard gates
Objective pass conditions before you claim verification is done:
1. **Input parse:** The JSON load command in step 1 exits 0 (no traceback). **Pass:** valid JSON on disk at `.beagle/llm-artifacts-review.json`.
2. **Echo before adjudicate:** Step 1 has printed the full finding table (one row per `findings[]` entry: id, file, line, category, description) sourced from the parsed JSON in **this** turn. **Pass:** the table exists in your output and its row count equals `len(findings)` — you have not begun any verdict before it.
3. **ID lock:** Step 1 has recorded the exact id set from `findings[]` and stated it explicitly. Every `results` entry maps 1:1 to a locked id — none added, none dropped. **Pass:** the locked id list is printed; if at any point an apparent finding has no matching locked id, you STOP (see step 1, ID lock).
4. **Evidence before verdict:** For each finding you adjudicate, you have applied [references/verification-checklist.md](references/verification-checklist.md) for its `category` (or documented why the category is N/A) and recorded matching strings in `checks_performed`. **Pass:** no `status` without at least one checklist-backed check or an explicit N/A note in `notes`.
5. **Output contract:** After writing `.beagle/llm-artifacts-verification.json`, the validate command in step 4 exits 0; `summary` counts equal the number of `results` entries by `status`; the `results` id set equals the **locked id set** from gate 3 exactly. **Pass:** schema-valid JSON and `results` ids == locked ids == source `findings[]` ids.
### 1. Load, ECHO, and lock ids
This is a two-part gate. **Parsing is not loading** — a `json.load` that exits 0 only proves the file is well-formed, not that you have the findings in context. You must echo the actual content before any adjudication.
**1a. Parse and echo the finding table.**
Print every finding from the parsed `findings[]` array — not from memory, not from the branch name, not from surrounding files:
```bash
python3 - <<'PY'
import json
r = json.load(open('.beagle/llm-artifacts-review.json'))
f = r['findings']
print(f"git_head={r.get('git_head')} scope={r.get('scope')} count={len(f)}")
print("| id | file | line | category | description |")
print("|----|------|------|----------|-------------|")
for x in f:
desc = (x.get('description') or '').replace('|', '\\|')[:80]
print(f"| {x['id']} | {x.get('file')} | {x.get('line')} | {x.get('category')} | {desc} |")
print("ids=" + ",".join(str(x['id']) for x in f))
PY
```
**Pass:** the command exits 0 and the table (one row per finding) appears in your output.
> **The only source of findings is the parsed `findings[]` array.** Do not infer findings from the branch name, the working directory, or surrounding files. If your mental model of the findings differs from the echoed table, **the table wins** — discard the mental model and adjudicate only the rows above.
**1b. ID lock (hard gate, before any adjudication).**
Record the exact set of ids from the `ids=` line above and state it now, e.g. `Locked ids: {1, 2, 3, 4, 5, 6, 7}`. This is the **locked id set**. Every result you write in step 4 must map 1:1 to this set: no id added, none dropped. The output id-check in step 4 references this locked set, not a re-derived one.
If, while verifying, you find yourself about to adjudicate a finding whose id is **not** in the locked set — or about to write a result for a file that does not appear in any locked row — **STOP**. That is an agent error (you are reasoning from memory or context, not the report). Re-read `findings[]` via the echo command above and restart adjudication. Do **not** record such a finding as `false_positive` (see step 3, Status discipline).
Record `git_head` and `scope` from the report (already printed by 1a). If the working tree no longer matches (optional strict mode: compare to `git rev-parse HEAD`), warn that line numbers may drift.
### 2. Order findings
Default order:
1. `category == "dead_code"` or `fix_action == "delete"` or `risk == "High"`
2. Remaining findings by `(risk descending, id ascending)`
With `--priority-only`, stop after processing category `dead_code` and all `fix_action: delete` (still write full output for those processed).
### 3. Verify each finding
For each finding, follow [references/verification-checklist.md](references/verification-checklist.md). Its **first** check for every category is the existence precondition: confirm the cited `file` exists at `source_git_head` before running any symbol/usage check.
**Minimum evidence per finding:**
- *tag and push a release after the release PR is merged
create a release PR (auto-detects previous tag)
Guides architectural decisions for Deep Agents applications. Use when deciding between Deep Agents vs alternatives, choosing backend strategies, designing subagent systems, or selecting middleware approaches.
Reviews Deep Agents code for bugs, anti-patterns, and improvements. Use when reviewing code that uses create_deep_agent, backends, subagents, middleware, or human-in-the-loop patterns. Catches common configuration and usage mistakes.
Implements agents using Deep Agents. Use when building agents with create_deep_agent, configuring backends, defining subagents, adding middleware, or setting up human-in-the-loop workflows.
Guides architectural decisions for LangGraph applications. Use when deciding between LangGraph vs alternatives, choosing state management strategies, designing multi-agent systems, or selecting persistence and streaming approaches.
Reviews LangGraph code for bugs, anti-patterns, and improvements. Use when reviewing code that uses StateGraph, nodes, edges, checkpointing, or other LangGraph features. Catches common mistakes in state management, graph structure, and async patterns.
Implements stateful agent graphs using LangGraph. Use when building graphs, adding nodes/edges, defining state schemas, implementing checkpointing, handling interrupts, or creating multi-agent systems with LangGraph.