Skill126 repo starsupdated 3mo ago

strict-enforcement

The strict-enforcement skill provides a verification methodology for validating code quality and implementation integrity when running claudikins-kernel:verify commands or enforcing cross-command gates. Use this skill to establish evidence-based pass/fail verdicts by running automated checks (tests, linting, type checking, builds), capturing actual output evidence (screenshots, API responses, CLI output), detecting flaky tests, and preventing code shipment until human review confirms working functionality rather than relying solely on passing tests.

View source Repository: claudikins-kernel

Install in Claude Code

Copy

git clone --depth 1 https://github.com/elb-pr/claudikins-kernel /tmp/strict-enforcement && cp -r /tmp/strict-enforcement/skills/strict-enforcement ~/.claude/skills/strict-enforcement

Then start a new Claude Code session; the skill loads automatically.

Definition

SKILL.md

# Strict Enforcement Verification Methodology

## When to use this skill

Use this skill when you need to:

- Run the `claudikins-kernel:verify` command
- Validate implementation before shipping
- Decide pass/fail verdicts
- Check code integrity after changes
- Enforce cross-command gates

## Core Philosophy

> "Evidence before assertions. Always." - Verification philosophy

Never claim code works without seeing it work. Tests passing is not enough. Claude must SEE the output.

### The Three Laws

1. **See it working** - Screenshots, curl responses, CLI output. Actual evidence.
2. **Human checkpoint** - No auto-shipping. Human reviews evidence and decides.
3. **Exit code 2 gates** - Verification failures block claudikins-kernel:ship. No exceptions.

## Verification Phases

### Phase 1: Automated Quality Checks

Run the automated checks first. Fast feedback.

| Check | Command Pattern                      | What It Catches                     |
| ----- | ------------------------------------ | ----------------------------------- |
| Tests | `npm test` / `pytest` / `cargo test` | Logic errors, regressions           |
| Lint  | `npm run lint` / `ruff` / `clippy`   | Style issues, common bugs           |
| Types | `tsc` / `mypy` / `cargo check`       | Type mismatches, interface drift    |
| Build | `npm run build` / `cargo build`      | Compilation errors, bundling issues |

**Flaky Test Detection (C-12):**

```
Test fails?
├── Re-run failed tests
├── Pass 2nd time?
│   └── Yes → STOP: [Accept flakiness] [Fix tests] [Abort]
└── Fail 2nd time?
    ├── Run isolated
    └── Still fail? → STOP: [Fix] [Skip] [Abort]
```

### Phase 2: Output Verification (catastrophiser)

This is the feedback loop that makes Claude's code actually work.

| Project Type | Verification Method                  | Evidence                       |
| ------------ | ------------------------------------ | ------------------------------ |
| Web app      | Start server, screenshot, test flows | Screenshots, console logs      |
| API          | Curl endpoints, check responses      | Status codes, response bodies  |
| CLI          | Run commands, capture output         | stdout, stderr, exit codes     |
| Library      | Run examples, check results          | Output values, test coverage   |
| Service      | Check logs, verify health endpoint   | Log patterns, health responses |

**Fallback Hierarchy (A-3):**

If primary method unavailable, fall back:

1. Start server + screenshot (preferred for web)
2. Curl endpoints (preferred for API)
3. Run CLI commands (preferred for CLI)
4. Run tests only (fallback)
5. Code review only (last resort)

**Timeout:** 30 seconds per verification method (CMD-30).

### Phase 3: Code Simplification (Optional)

After verification passes, optionally run cynic for polish.

**Prerequisites:**

- Phase 2 (catastrophiser) must PASS
- Human approves: "Run cynic for polish pass?"

**cynic Rules:**

- Preserve exact behaviour (tests MUST still pass)
- Remove unnecessary abstraction
- Improve naming clarity
- Delete dead code
- Flatten nested conditionals

**If tests fail after simplification:**

- Log failure reasons
- Show human
- Proceed anyway (A-5) with caveat

See [cynic-rollback.md](references/cynic-rollback.md) for recovery patterns.

### Phase 4: Klaus Escalation

If stuck during verification:

```
Is mcp__claudikins-klaus available? (E-16)
├── No →
│   Offer: [Manual review] [Ask Claude differently] (E-17)
│   Fallback: [Accept with uncertainty] [Max retries, abort] (E-18)
└── Yes →
    Spawn klaus via SubagentStop hook
```

### Phase 5: Human Checkpoint

The final gate. Present comprehensive evidence.

```
Verification Report
-------------------
Tests:  ✓ 47/47 passed
Lint:   ✓ 0 issues
Types:  ✓ 0 errors
Build:  ✓ success

Evidence:
- Screenshot: .claude/evidence/login-flow.png
- API test: POST /api/auth → 200 OK
- CLI test: mycli --help → exit 0

[Ready to Ship] [Needs Work] [Accept with Caveats]
```

Human decides. If approved, set `unlock_ship = true`.

## Rationalizations to Resist

Agents under pressure find excuses. These are all violations:

| Excuse                                     | Reality                                                               |
| ------------------------------------------ | --------------------------------------------------------------------- |
| "Tests pass, that's good enough"           | Tests aren't enough. SEE it working. Screenshots, curl, output.       |
| "I'll verify after shipping"               | Verify BEFORE ship. That's the whole point.                           |
| "The type checker caught everything"       | Types don't catch runtime issues. Get evidence.                       |
| "Screenshot failed but it probably works"  | "Probably" isn't evidence. Fix the screenshot or use fallback.        |
| "Human checkpoint is just a formality"     | Human checkpoint is the gate. No auto-shipping.                       |
| "Code review is enough for this change"    | Code review is last resort fallback. Try harder.                      |
| "Tests are flaky, I'll ignore the failure" | Flaky tests hide real failures. Fix or explicitly accept with caveat. |
| "Exit code 2 is too strict"                | Exit code 2 exists to block bad ships. Pass properly.                 |

**All of these mean: Get evidence. Human decides. No shortcuts.**

## Red Flags — STOP and Reassess

If you're thinking any of these, you're about to violate the methodology:

- "It should work because..."
- "The tests pass so..."
- "I'm confident that..."
- "It worked before..."
- "The types check so..."
- "I'll just skip verification this once"
- "Human will approve anyway"
- "Evidence isn't necessary for this change"

**All of these mean: STOP. Get evidence. Present to human. Let them decide.**

## Exit Code 2 Pattern (CRITICAL)

The verify-gate.sh hook enforces the gate:

```bash
# Both conditions MUST be true
ALL_PASSED=$(jq -r '.all_checks_passed' "$STATE")
HUMAN_APPROVED=$(j