Skip to main content
ClaudeWave
Skill542 repo starsupdated 2d ago

nw-formal-verification-tlaplus

This Claude Code skill provides decision heuristics for when formal verification with TLA+ adds value to system design, along with core patterns and state management techniques. Use it when evaluating whether to apply formal methods to distributed systems, consensus protocols, safety-critical state machines, or concurrent systems where subtle interleaving bugs could cause data loss or financial impact.

Install in Claude Code
Copy
git clone --depth 1 https://github.com/nWave-ai/nWave /tmp/nw-formal-verification-tlaplus && cp -r /tmp/nw-formal-verification-tlaplus/nWave/skills/nw-formal-verification-tlaplus ~/.claude/skills/nw-formal-verification-tlaplus
Then start a new Claude Code session; the skill loads automatically.

SKILL.md

# Formal Verification with TLA+

## When to Recommend Formal Verification

### Decision Tree

```
Is the system distributed or concurrent?
|
+-- No --> Complex state machine with high failure cost?
|          +-- No --> NOT cost-effective. Use property-based testing.
|          +-- Yes --> CONSIDER TLA+
|
+-- Yes --> Consensus, coordination, or distributed transactions?
|           +-- Yes --> RECOMMEND TLA+
|           +-- No --> Could concurrency bug cause data loss or safety issues?
|                      +-- Yes --> RECOMMEND TLA+
|                      +-- No --> OFFER as option
```

### Strong Indicators (Recommend)

| Domain | Why TLA+ Adds Value | Evidence |
|--------|-------------------|----------|
| Distributed consensus (Paxos, Raft) | Subtle interleaving bugs in leader election | Raft TLA+ spec ~400 lines, found implementation bugs |
| Financial distributed transactions | Atomicity violations cause monetary loss | AWS DynamoDB replication verified |
| Leader election, distributed locking | Split-brain, deadlock, stale-lock | AWS lock manager verified |
| Eventual consistency / CRDTs | Convergence proofs required | TLA+ CRDT framework verifies SEC |
| Safety-critical state machines | Regulatory requirements | DO-178C, CENELEC recognize formal methods |
| Multi-party coordination (sagas, 2PC) | Compensation ordering, partial failure | 2PC is canonical TLA+ example |
| Data replication protocols | Ordering, consistency under failure | Elasticsearch, MongoDB, Cosmos DB verified |

### When NOT to Use

- Simple CRUD (bugs are in implementation, not design)
- Single-process without complex state machines
- Prototypes/MVPs (design will change before verification completes)
- Performance optimization (TLA+ models correctness, not performance)

### Cost-Benefit Reference

- Learning curve: 2-3 weeks to useful results (AWS engineers, all levels)
- Typical spec effort: 2-4 weeks part-time for a distributed protocol
- ROI highest when: bug cost is high, system is long-lived, protocol is novel, concurrency testing is impractical

## Core Concepts for Architects

### What TLA+ Specifies

TLA+ describes **what** a system should do (allowed behaviors), not **how** to implement it. Specifications are mathematical objects checked for correctness before any code exists.

### Safety vs. Liveness

| Property Type | Meaning | Expression | Example |
|--------------|---------|------------|---------|
| Safety | Nothing bad happens | Invariant: predicate true in every reachable state | "Two processes never hold same lock" |
| Liveness | Something good eventually happens | Temporal: `<>` (eventually), `[]<>` (infinitely often) | "Every request eventually gets response" |

Safety violations produce counterexample traces (the debugging artifact). Liveness requires fairness conditions.

### PlusCal vs. Raw TLA+

PlusCal compiles to TLA+ with programming-like syntax. Start with PlusCal for first 2-3 specs, then learn raw TLA+ for cases PlusCal cannot express.

Key PlusCal constructs: `variables` (state) | `labels` (atomic action boundaries) | `either/or` (nondeterministic choice) | `await` (blocking) | `process \in 1..N` (concurrent processes) | `fair process` (weak fairness)

Labels define concurrency granularity: everything between two labels is one atomic step. Two processes interleave only at label boundaries.

## State Explosion Management

State space grows exponentially: `(states per node)^(nodes) x (message permutations)`.

### Containment Strategies

| Strategy | Technique | Impact |
|----------|-----------|--------|
| Bound parameters | Start with 2-3 nodes, 2-4 messages | Most bugs appear at small N |
| Symmetry reduction | `SYMMETRY Permutations(Nodes)` | Up to N! reduction |
| Reduce labels | Merge labels where fine-grained atomicity unnecessary | Orders of magnitude |
| State constraints | `CONSTRAINT Len(log[n]) < MaxLogLength` | Prune uninteresting states |
| Abstraction | Model protocol not implementation (TCP -> message set) | Dramatic reduction |
| Decomposition | Multiple focused specs, not one monolith | Each independently checkable |
| Progressive refinement | 2 nodes -> 3 nodes -> add failures -> add liveness | Incremental verification |
| Simulation mode | `java -jar tla2tools.jar -simulate -depth 100` | Trades completeness for speed |

### Memory and Time Budgets

| Unique States | Expected Time | Memory | Approach |
|--------------|---------------|--------|----------|
| < 10K | Seconds | < 1 GB | Exhaustive, single thread |
| 10K - 1M | Minutes | 1-4 GB | Exhaustive, `-workers auto` |
| 1M - 100M | Hours | 4-32 GB | Exhaustive with constraints |
| 100M - 1B | Days | 32-64 GB | Large instance or simulation |
| > 1B | Weeks | 60+ GB | Simulation, TLAPS, or decompose |

### Estimation Before Running

1. Count distinct variable values in model
2. Multiply domains together for baseline
3. Start TLC with smallest parameters, observe state count
4. Extrapolate: doubling a parameter typically squares or cubes the space

## Key Specification Patterns

### Two-Phase Commit (2PC)
- Variables: rmState, tmState, tmPrepared, msgs
- Safety: no RM commits while another aborts (`Consistency`)
- State space: 3 RMs ~718 states, 5 RMs ~21,488 states
- Common mistake: not modeling RM spontaneous abort or unreliable network

### Distributed Consensus (Raft)
- Variables: currentTerm, votedFor, log, state, votesGranted, msgs
- Safety: at most one leader per term (`ElectionSafety`)
- Safety: logs with same index+term are identical (`LogMatching`)
- State space: 3 nodes, MaxTerm=2 ~10K-100K states

### Saga (Compensating Transactions)
- Variables: stepState, sagaState, compensateIdx
- Safety: steps execute in order, compensations in reverse (`OrderInvariant`)
- Safety: no completed steps remain after abort (`CompensationComplete`)
- Common mistake: not enforcing reverse compensation order

### Distributed Lock with Lease
- Variables: lockHolder, leaseExpiry, clock, nodeState
- Safety: at most one holder (
nw-ab-critique-dimensionsSkill

Review dimensions for validating agent quality - template compliance, safety, testing, and priority validation

nw-abr-critique-dimensionsSkill

Review dimensions for validating agent quality - template compliance, safety, testing, and priority validation

nw-ad-critique-dimensionsSkill

Review dimensions for acceptance test quality - happy path bias, GWT compliance, business language purity, coverage completeness, walking skeleton user-centricity, priority validation, observable behavior assertions, traceability coverage, and walking skeleton boundary proof

nw-agent-creation-workflowSkill

Detailed 5-phase workflow for creating agents - from requirements analysis through validation and iterative refinement

nw-agent-testingSkill

5-layer testing approach for agent validation including adversarial testing, security validation, and prompt injection resistance

nw-architectural-styles-tradeoffsSkill

Architectural style selection decision matrices, trade-off analysis, structural enforcement rules, and combination patterns. Load when choosing or evaluating architecture styles.

nw-architecture-patternsSkill

Comprehensive architecture patterns, methodologies, quality frameworks, and evaluation methods for solution architects. Load when designing system architecture or selecting patterns.

nw-at-completeness-checkSkill

Canonical AT completeness gate — research-anchored 7-category taxonomy (C1-C7) + 15-item mechanical checklist. Paradigm-neutral. Drives acceptance-designer reviewer verdict deterministically.