nw-formal-verification-tlaplus
This Claude Code skill provides decision heuristics for when formal verification with TLA+ adds value to system design, along with core patterns and state management techniques. Use it when evaluating whether to apply formal methods to distributed systems, consensus protocols, safety-critical state machines, or concurrent systems where subtle interleaving bugs could cause data loss or financial impact.
git clone --depth 1 https://github.com/nWave-ai/nWave /tmp/nw-formal-verification-tlaplus && cp -r /tmp/nw-formal-verification-tlaplus/nWave/skills/nw-formal-verification-tlaplus ~/.claude/skills/nw-formal-verification-tlaplusSKILL.md
# Formal Verification with TLA+ ## When to Recommend Formal Verification ### Decision Tree ``` Is the system distributed or concurrent? | +-- No --> Complex state machine with high failure cost? | +-- No --> NOT cost-effective. Use property-based testing. | +-- Yes --> CONSIDER TLA+ | +-- Yes --> Consensus, coordination, or distributed transactions? | +-- Yes --> RECOMMEND TLA+ | +-- No --> Could concurrency bug cause data loss or safety issues? | +-- Yes --> RECOMMEND TLA+ | +-- No --> OFFER as option ``` ### Strong Indicators (Recommend) | Domain | Why TLA+ Adds Value | Evidence | |--------|-------------------|----------| | Distributed consensus (Paxos, Raft) | Subtle interleaving bugs in leader election | Raft TLA+ spec ~400 lines, found implementation bugs | | Financial distributed transactions | Atomicity violations cause monetary loss | AWS DynamoDB replication verified | | Leader election, distributed locking | Split-brain, deadlock, stale-lock | AWS lock manager verified | | Eventual consistency / CRDTs | Convergence proofs required | TLA+ CRDT framework verifies SEC | | Safety-critical state machines | Regulatory requirements | DO-178C, CENELEC recognize formal methods | | Multi-party coordination (sagas, 2PC) | Compensation ordering, partial failure | 2PC is canonical TLA+ example | | Data replication protocols | Ordering, consistency under failure | Elasticsearch, MongoDB, Cosmos DB verified | ### When NOT to Use - Simple CRUD (bugs are in implementation, not design) - Single-process without complex state machines - Prototypes/MVPs (design will change before verification completes) - Performance optimization (TLA+ models correctness, not performance) ### Cost-Benefit Reference - Learning curve: 2-3 weeks to useful results (AWS engineers, all levels) - Typical spec effort: 2-4 weeks part-time for a distributed protocol - ROI highest when: bug cost is high, system is long-lived, protocol is novel, concurrency testing is impractical ## Core Concepts for Architects ### What TLA+ Specifies TLA+ describes **what** a system should do (allowed behaviors), not **how** to implement it. Specifications are mathematical objects checked for correctness before any code exists. ### Safety vs. Liveness | Property Type | Meaning | Expression | Example | |--------------|---------|------------|---------| | Safety | Nothing bad happens | Invariant: predicate true in every reachable state | "Two processes never hold same lock" | | Liveness | Something good eventually happens | Temporal: `<>` (eventually), `[]<>` (infinitely often) | "Every request eventually gets response" | Safety violations produce counterexample traces (the debugging artifact). Liveness requires fairness conditions. ### PlusCal vs. Raw TLA+ PlusCal compiles to TLA+ with programming-like syntax. Start with PlusCal for first 2-3 specs, then learn raw TLA+ for cases PlusCal cannot express. Key PlusCal constructs: `variables` (state) | `labels` (atomic action boundaries) | `either/or` (nondeterministic choice) | `await` (blocking) | `process \in 1..N` (concurrent processes) | `fair process` (weak fairness) Labels define concurrency granularity: everything between two labels is one atomic step. Two processes interleave only at label boundaries. ## State Explosion Management State space grows exponentially: `(states per node)^(nodes) x (message permutations)`. ### Containment Strategies | Strategy | Technique | Impact | |----------|-----------|--------| | Bound parameters | Start with 2-3 nodes, 2-4 messages | Most bugs appear at small N | | Symmetry reduction | `SYMMETRY Permutations(Nodes)` | Up to N! reduction | | Reduce labels | Merge labels where fine-grained atomicity unnecessary | Orders of magnitude | | State constraints | `CONSTRAINT Len(log[n]) < MaxLogLength` | Prune uninteresting states | | Abstraction | Model protocol not implementation (TCP -> message set) | Dramatic reduction | | Decomposition | Multiple focused specs, not one monolith | Each independently checkable | | Progressive refinement | 2 nodes -> 3 nodes -> add failures -> add liveness | Incremental verification | | Simulation mode | `java -jar tla2tools.jar -simulate -depth 100` | Trades completeness for speed | ### Memory and Time Budgets | Unique States | Expected Time | Memory | Approach | |--------------|---------------|--------|----------| | < 10K | Seconds | < 1 GB | Exhaustive, single thread | | 10K - 1M | Minutes | 1-4 GB | Exhaustive, `-workers auto` | | 1M - 100M | Hours | 4-32 GB | Exhaustive with constraints | | 100M - 1B | Days | 32-64 GB | Large instance or simulation | | > 1B | Weeks | 60+ GB | Simulation, TLAPS, or decompose | ### Estimation Before Running 1. Count distinct variable values in model 2. Multiply domains together for baseline 3. Start TLC with smallest parameters, observe state count 4. Extrapolate: doubling a parameter typically squares or cubes the space ## Key Specification Patterns ### Two-Phase Commit (2PC) - Variables: rmState, tmState, tmPrepared, msgs - Safety: no RM commits while another aborts (`Consistency`) - State space: 3 RMs ~718 states, 5 RMs ~21,488 states - Common mistake: not modeling RM spontaneous abort or unreliable network ### Distributed Consensus (Raft) - Variables: currentTerm, votedFor, log, state, votesGranted, msgs - Safety: at most one leader per term (`ElectionSafety`) - Safety: logs with same index+term are identical (`LogMatching`) - State space: 3 nodes, MaxTerm=2 ~10K-100K states ### Saga (Compensating Transactions) - Variables: stepState, sagaState, compensateIdx - Safety: steps execute in order, compensations in reverse (`OrderInvariant`) - Safety: no completed steps remain after abort (`CompensationComplete`) - Common mistake: not enforcing reverse compensation order ### Distributed Lock with Lease - Variables: lockHolder, leaseExpiry, clock, nodeState - Safety: at most one holder (
Review dimensions for validating agent quality - template compliance, safety, testing, and priority validation
Review dimensions for validating agent quality - template compliance, safety, testing, and priority validation
Review dimensions for acceptance test quality - happy path bias, GWT compliance, business language purity, coverage completeness, walking skeleton user-centricity, priority validation, observable behavior assertions, traceability coverage, and walking skeleton boundary proof
Detailed 5-phase workflow for creating agents - from requirements analysis through validation and iterative refinement
5-layer testing approach for agent validation including adversarial testing, security validation, and prompt injection resistance
Architectural style selection decision matrices, trade-off analysis, structural enforcement rules, and combination patterns. Load when choosing or evaluating architecture styles.
Comprehensive architecture patterns, methodologies, quality frameworks, and evaluation methods for solution architects. Load when designing system architecture or selecting patterns.
Canonical AT completeness gate — research-anchored 7-category taxonomy (C1-C7) + 15-item mechanical checklist. Paradigm-neutral. Drives acceptance-designer reviewer verdict deterministically.