Skip to main content
ClaudeWave
Skill126 estrellas del repoactualizado 3d ago

proof-verify

Instalar en Claude Code
Copiar
git clone --depth 1 https://github.com/AnastasiyaW/claude-code-config /tmp/proof-verify && cp -r /tmp/proof-verify/skills/development/proof-verify ~/.claude/skills/proof-verify
Después abre una sesión nueva de Claude Code; el skill carga automáticamente.

SKILL.md

# Proof Verify

Plan-based verification: freeze acceptance criteria BEFORE building, verify AFTER with independent agents.

## When to Use

- After completing a feature/fix that was built from a plan
- When you need independent confirmation that work meets spec
- When the builder should NOT verify their own work
- Trigger phrases: "verify against plan", "check the implementation", "proof check", "independent review"

## The Pattern

```
PHASE 1: PLAN (before any code)
  Create .proof/PLAN.md with numbered acceptance criteria
  Each AC: testable, specific, has a verification command or check
  Plan is FROZEN - no changes during build

PHASE 2: BUILD (normal work)
  Implement against the plan
  Mark progress in .proof/PROGRESS.md
  Builder does NOT self-verify

PHASE 3: VERIFY (after build, independent agent)
  Fresh agent reads PLAN.md (never saw the build process)
  Walks through each AC, runs verification commands
  Writes .proof/VERDICT.md with PASS/FAIL per criterion
  If any FAIL → .proof/PROBLEMS.md with specific fixes

PHASE 4: FIX (if needed)
  Builder reads PROBLEMS.md, makes minimal fixes
  Back to PHASE 3 (re-verify)
  Loop until all PASS
```

## Phase 1: Create Plan

Create `.proof/PLAN.md` in the project root:

```markdown
# Verification Plan

**Created:** YYYY-MM-DD HH:MM
**Task:** [one-line description]
**Builder:** [session ID or "current"]
**Status:** FROZEN

## Acceptance Criteria

### AC1: [short name]
**Description:** [what must be true]
**Verify:** [exact command or check to run]
**Expected:** [what success looks like]

### AC2: [short name]
**Description:** [what must be true]
**Verify:** [exact command or check to run]
**Expected:** [what success looks like]

### AC3: [short name]
...

## Out of Scope
- [explicitly what this plan does NOT cover]

## Constraints
- [time, resource, or technical constraints]
```

Rules for good ACs:
- **Testable** - there is a command or check that produces PASS/FAIL
- **Specific** - "function returns correct value" not "code works"
- **Independent** - each AC can be verified without the others
- **3-8 ACs** - fewer than 3 = loopholes, more than 8 = checklist gaming
- **Frozen** - once written, do not modify during build

## Phase 2: Build

Normal implementation. The only additions:

1. Create `.proof/PROGRESS.md` as you work:

```markdown
# Build Progress

### AC1: [name]
- [x] Implemented in `src/foo.py:42`
- Files changed: `src/foo.py`, `tests/test_foo.py`

### AC2: [name]
- [x] Implemented in `src/bar.py:18`
- Files changed: `src/bar.py`
- Note: chose approach B because [reason]
```

2. After build is complete, write `.proof/EVIDENCE.md`:

```markdown
# Evidence

### AC1: [name]
**Command:** `pytest tests/test_foo.py -v`
**Output:**
\```
tests/test_foo.py::test_returns_correct PASSED
tests/test_foo.py::test_handles_edge PASSED
\```
**Result:** PASS

### AC2: [name]
**Command:** `grep -c "TODO" src/bar.py`
**Output:** `0`
**Result:** PASS
```

Builder collects evidence but does NOT write the verdict. That is the verifier's job.

## Phase 3: Verify (Independent Agent)

This is the critical phase. The verifier MUST be:
- A **fresh agent** (new session or subagent) that never saw the build
- Given ONLY: `PLAN.md` + access to the codebase
- NOT given: `PROGRESS.md`, `EVIDENCE.md`, or any build context

### Verifier prompt template

```
You are an independent verifier. Your job is to check whether
the implementation meets the acceptance criteria in .proof/PLAN.md.

Rules:
1. Read .proof/PLAN.md first. This is your ONLY specification.
2. For each AC, run the verification command yourself.
3. Do NOT read .proof/PROGRESS.md or .proof/EVIDENCE.md
   (those are the builder's claims - you verify independently).
4. Write your verdict to .proof/VERDICT.md in this format:

# Verification Verdict

**Verifier:** [your session ID]
**Date:** YYYY-MM-DD HH:MM
**Plan hash:** [first 8 chars of md5 of PLAN.md]

## Results

### AC1: [name]
**Status:** PASS | FAIL
**Evidence:** [what you saw when you ran the check]
**Notes:** [any observations]

### AC2: [name]
...

## Summary
- Total: N criteria
- Passed: X
- Failed: Y
- **Overall:** PASS | FAIL

5. If any AC fails, also create .proof/PROBLEMS.md:

# Problems

### AC2: [name]
**Expected:** [from PLAN.md]
**Actual:** [what you found]
**Suggested fix:** [smallest change that would fix it]
**Affected files:** [list]

6. Do NOT fix anything. You are read-only. Report only.
```

### How to spawn the verifier

**Option A: Subagent (same session)**
```
Agent({
  description: "Independent verification against plan",
  prompt: "[verifier prompt above]",
  mode: "plan"  // read-only first
})
```

**Option B: Fresh session (stronger isolation)**
Write handoff with instruction: "Start by reading .proof/PLAN.md and running verification."

**Option C: Multiple verifiers (highest confidence)**
Spawn 2-3 verifiers independently. If they disagree on any AC, that AC needs investigation.

## Phase 4: Fix Loop

If VERDICT.md shows any FAIL:

1. Builder reads `PROBLEMS.md`
2. Makes **minimal** fixes (not refactoring, not "while I'm here")
3. Updates `EVIDENCE.md` with new evidence for failed ACs
4. Verifier runs again (Phase 3)
5. Loop until all PASS

Typical: 1-2 fix rounds. If 3+ rounds on same AC → the AC itself might be wrong. Revisit PLAN.md.

## File Structure

```
.proof/
  PLAN.md        # frozen acceptance criteria (Phase 1)
  PROGRESS.md    # builder's notes (Phase 2)
  EVIDENCE.md    # builder's evidence (Phase 2)
  VERDICT.md     # verifier's verdict (Phase 3)
  PROBLEMS.md    # verifier's findings (Phase 3, if failures)
```

## Gotchas

- **Builder reads VERDICT, not the reverse.** Verifier never sees builder's evidence. This prevents confirmation bias.
- **"PASS with concerns" is FAIL.** Either it passes or it doesn't. No soft passes.
- **Plan hash in verdict.** If someone edited PLAN.md mid-build, the hash won't match. Catch.
- **Time limit.** If verification takes >30 min, the ACs are too vague. Rewrite them.