Skill116 estrellas del repoactualizado 5d ago

verify-before-claim

Iron rule — no completion claims without fresh verification evidence. Use whenever about to claim work is done, fixed, working, or passing. Run verification commands and show output before making any success statement.

Ver fuente Repositorio: third-brain-v5-skills

Instalar en Claude Code

Copiar

git clone --depth 1 https://github.com/Mark393295827/third-brain-v5-skills /tmp/verify-before-claim && cp -r /tmp/verify-before-claim/skills/verify-before-claim ~/.claude/skills/verify-before-claim

Después abre una sesión nueva de Claude Code; el skill carga automáticamente.

Definición

SKILL.md

# Verify Before Claim

**NO COMPLETION CLAIMS WITHOUT FRESH VERIFICATION EVIDENCE.**

## Usage Template

**Prompt**
```text
Use verify-before-claim. Before saying this is done, identify the proof command, run it fresh, read the output, and state the actual status with evidence.
```

**Use Case**
- Preventing unsupported claims that code, docs, tests, or workflows are fixed, passing, complete, or working.

**Expected Result**
- The agent reports command output, exit code, pass/fail count, and any remaining unverified risk.

**Output Example**
- `Command: npm test`, `Exit code: 0`, `Result: 42 passed, 0 failed`, plus residual risks.

**Verification Case**
- No completion language appears unless fresh evidence is shown in the same response.

**Verified Effect**
- The agent shifts from confidence-based status claims to evidence-based status reports.

## Success Metrics

- A fresh verification command or inspection is run in the current session.
- Output includes command, exit code or observable evidence, result, and residual risk.
- No completion claim is made when evidence is missing, stale, or failing.

## When to Use

- Before saying "done", "fixed", "complete", "passing", "working"
- Before committing or creating a PR
- Before expressing satisfaction with results
- When the user asks "is it working?"
- Any time the agent feels the urge to say "it should work"

## The Gate Function

```
BEFORE claiming any status:

1. IDENTIFY: What command proves this claim?
2. RUN: Execute the FULL command (fresh, complete)
3. READ: Full output, check exit code, count failures
4. VERIFY: Does output confirm the claim?
   - If NO: State actual status with evidence
   - If YES: State claim WITH evidence
5. ONLY THEN: Make the claim
```

**Skip any step = lying, not verifying.**

## Common Failures

| Claim | Requires | Not Sufficient |
|-------|----------|----------------|
| Tests pass | Test output: 0 failures | Previous run, "should pass" |
| Linter clean | Linter output: 0 errors | Partial check |
| Build succeeds | Build exit code 0 | "Looks good" |
| Bug fixed | Reproduce original symptom: passes | "I changed the code" |
| Tests added | New tests exist + old tests still pass | "Tests should work" |
| Vendor/product claim | Official docs, changelog, or independent source; otherwise mark single-source | Keynote/demo statement alone |
| Agent can act for user | Mandate, tool log, preview/confirmation, receipt, rollback path | "User asked generally" |
| Generated media is safe/provenanced | Source/prompt/edit record and disclosure or watermark path | "Looks AI-generated" |

## Red Flags — STOP

- Using "should", "probably", "seems to" about completion
- Expressing satisfaction before verification ("Great!", "Perfect!", "Done!")
- About to commit/push/PR without verification
- Trusting agent success reports without independent check
- Relying on partial verification

## Expected Value Thinking (From Poker Psychology)

> "The biggest bluff is convincing yourself you have control when you don't." — Maria Konnikova

**Apply to verification:**

| Poker Concept | Verification Application |
|---------------|-------------------------|
| **Expected Value (EV)** | Don't just check if it works now; check if it will work reliably |
| **Controllable vs Uncontrollable** | Focus on what you can verify; acknowledge what you can't |
| **Tilt Control** | Don't let excitement about "it works!" skip verification |
| **Process over Results** | Good verification process > lucky outcome |

**The verification mindset:**

```
1. Separate what you CAN verify from what you CAN'T
   - CAN: Code compiles, tests pass, output matches expected
   - CAN'T: Edge cases, production behavior, user experience

2. Focus on EXPECTED VALUE, not single results
   - "Tests pass once" ≠ "Tests will pass reliably"
   - "Works in my environment" ≠ "Works in all environments"

3. Be skeptical of success
   - Good decisions can have bad outcomes
   - Bad decisions can have good outcomes
   - KEY: Evaluate the PROCESS, not just the result
```

## Single-Source Product Claims

Keynotes, launch posts, demos, and roadmap statements can be useful evidence, but they are not enough for operational guidance. When the only source is a vendor statement:

```text
Status: single-source
Source type: keynote / launch post / docs / independent test
Operational risk:
What must be verified next:
```

Do not turn a demo into a guarantee. Treat availability, scale numbers, benchmark claims, payment protocols, security properties, and adoption figures as unverified until checked.

**The Biggest Lesson from Poker:**
> "Knowing when to fold is more important than knowing when to play."

In verification: Knowing when to say "I'm not sure" is better than claiming "it works" without evidence.
- **ANY wording implying success without having run verification**

## Rationalization Prevention

| Excuse | Reality |
|--------|---------|
| "Should work now" | RUN the verification |
| "I'm confident" | Confidence ≠ evidence |
| "Just this once" | No exceptions |
| "Different context so rule doesn't apply" | Spirit over letter |

## SOP TDD Context

This skill is the verification component of the TDD-for-SOPs approach. Before writing a new SOP:

1. **Define failure**: What will the agent do wrong without the SOP?
2. **Watch it fail**: Verify the baseline error exists (RED)
3. **Write the SOP**: Address those specific failure modes
4. **Watch it pass**: Verify the agent now complies (GREEN)
5. **Refactor**: Close loopholes while maintaining compliance

## Quality Gates

- [ ] Verification command identified and run
- [ ] Full output shown (exit code, failure count)
- [ ] Claim only made WITH evidence
- [ ] No "should", "probably", "I think" on completion status
- [ ] Vendor/demo/product claims are marked single-source unless independently verified
- [ ] Delegated user actions have mandate, receipt, and rollback evidence

Del mismo repositorio

daily-okrSkill

Execute a daily knowledge compound closed loop — 7 Key Results from input to feedback with scoring. Use when the user wants to do a daily review, plan their day, or run a knowledge workflow.

session-learnSkill

Extract reusable knowledge from a work session and save concepts, entities, corrections, patterns, ideas, decisions, and gaps to the wiki. Use when ending a session or when the user says to extract knowledge.

token-cost-trackerSlash Command

Estimate and track token usage and cost across the knowledge pipeline. Run before expensive tasks to budget, after tasks to log actuals.

wiki-lintSkill

Health-check the knowledge wiki — find orphans, broken links, missing frontmatter, contradictions, stale content, and statistical drift. Use when the user says "lint the wiki", "health check", or periodically for maintenance.

agent-teams-commandSkill

Command multi-agent work with bounded roles, ownership, integration gates, and verification loops. Use when the user needs Claude Code Agent Teams, parallel agents, delegation strategy, or multi-agent orchestration.

agentic-engineeringSkill

Design or refactor agent skills, workflows, and operating loops for model-native Agentic Engineering. Use when making skills more autonomous, concise, verifiable, long-horizon capable, token-efficient, and lower-friction for human-LLM collaboration.

ai-six-sigma-property-osSkill

Design an AI Six Sigma Black Belt operating model for property service, maintenance dispatch, environmental testing, quote generation, CRM follow-up, and workflow quality dashboards. Use when the user needs a Property Agent OS, AI + Ontology + DMAIC management system, CTQ metrics, agent-team roles, work-order states, or MVP roadmap for operations quality.

anthropic-osSkill

Improve a personal or team operating system with self-evolving loops, CASH allocation, 3B creativity, predictive coding, and diagnostics. Use when the user wants to redesign a work method, learning loop, or cognitive operating system.