Skill139 repo starsupdated 5mo ago

counterexample-to-test-generator

This Claude Code skill automatically converts abstract counterexample traces from model checkers like SPIN, CBMC, NuSMV, and TLA+ into executable test cases. It maps symbolic states to concrete values, generates test framework code with proper setup and assertion phases, and creates regression tests that reproduce property violations. Use this when you need to translate formal verification failures into runnable tests for validation, bug reproduction, or regression testing.

View source Repository: Skills-4-SE

Install in Claude Code

Copy

git clone --depth 1 https://github.com/ArabelaTso/Skills-4-SE /tmp/counterexample-to-test-generator && cp -r /tmp/counterexample-to-test-generator/skills/counterexample-to-test-generator ~/.claude/skills/counterexample-to-test-generator

Then start a new Claude Code session; the skill loads automatically.

Definition

SKILL.md

# Counterexample To Test Generator

## Overview

This skill transforms counterexample traces from model checkers into executable test cases that reliably reproduce property violations. It bridges formal verification and testing by mapping abstract states to concrete values and generating runnable tests with clear traceability from counterexample steps to test logic.

## Workflow

### Step 1: Analyze Inputs

Gather and understand the counterexample trace and program:

1. **Identify the model checker format**: Determine which tool produced the counterexample (SPIN, CBMC, NuSMV, TLA+, JPF, etc.). See [references/model_checker_formats.md](references/model_checker_formats.md) for format details.

2. **Extract key information**:
   - Initial state values
   - Sequence of transitions/steps
   - Variable values at each step
   - Property violation point
   - Error condition or assertion failure

3. **Understand the program structure**:
   - Entry points and function signatures
   - Input parameters and their types
   - State variables involved in the trace
   - Control flow relevant to the counterexample

### Step 2: Map Abstract States to Concrete Values

Translate the counterexample's abstract representation into concrete test inputs:

1. **Determine concrete values** for abstract state variables:
   - Map symbolic values to concrete instances
   - Resolve non-deterministic choices to specific values
   - Handle ranges and constraints from the model

2. **Identify input sequences**:
   - Extract the sequence of function calls or operations
   - Determine parameter values for each call
   - Identify timing or ordering constraints

3. **Handle environment assumptions**:
   - External inputs or system calls
   - Concurrency or scheduling decisions
   - Resource states (files, network, memory)

### Step 3: Generate Test Structure

Create the test case framework in the target language:

1. **Choose test framework** based on the program language:
   - C/C++: Google Test, CUnit, Check
   - Java: JUnit, TestNG
   - Python: pytest, unittest
   - C#: NUnit, xUnit

2. **Structure the test**:
   - Setup phase: Initialize state to match counterexample start
   - Execution phase: Replay the counterexample sequence
   - Assertion phase: Verify the property violation occurs

3. **Add traceability comments**: Map each test step to counterexample steps for debugging and maintenance.

### Step 4: Implement Test Logic

Write the executable test code:

1. **Setup code**:
   ```
   // Initialize variables to counterexample initial state
   // Set up test fixtures or mocks
   // Configure environment (if needed)
   ```

2. **Execution sequence**:
   ```
   // Step 1 (CE line X): [description]
   // Execute operation with concrete values

   // Step 2 (CE line Y): [description]
   // Execute next operation
   ```

3. **Assertions**:
   ```
   // Verify property violation (CE line Z)
   // Assert expected failure condition
   // Check final state matches counterexample
   ```

### Step 5: Generate Output

Produce the complete test case with documentation:

1. **Test file**: Complete, compilable/runnable test code
2. **Mapping document**: Table linking counterexample steps to test lines
3. **Execution instructions**: How to compile and run the test
4. **Expected behavior**: What the test should demonstrate (failure reproduction)

## Example Workflow

**Input**: SPIN counterexample showing a deadlock in a concurrent system

**Output**:
```c
// test_deadlock.c - Reproduces deadlock from SPIN counterexample trail
#include <pthread.h>
#include <assert.h>

// Counterexample mapping:
// CE Step 1-2: Thread 1 acquires lock A
// CE Step 3-4: Thread 2 acquires lock B
// CE Step 5: Thread 1 waits for lock B (blocks)
// CE Step 6: Thread 2 waits for lock A (deadlock)

void* thread1_func(void* arg) {
    pthread_mutex_lock(&lock_a);  // CE Step 1
    sleep(1);                      // CE Step 2 (timing)
    pthread_mutex_lock(&lock_b);  // CE Step 5 (blocks)
    // ... rest of test
}

void test_deadlock_scenario() {
    // Setup: Initialize locks (CE initial state)
    pthread_mutex_init(&lock_a, NULL);
    pthread_mutex_init(&lock_b, NULL);

    // Execute: Create threads in counterexample order
    pthread_create(&t1, NULL, thread1_func, NULL);
    pthread_create(&t2, NULL, thread2_func, NULL);

    // This test will hang, demonstrating the deadlock
    pthread_join(t1, NULL);  // Will timeout
}
```

## Best Practices

1. **Minimize test complexity**: Generate the simplest test that reproduces the violation
2. **Preserve causality**: Maintain the exact sequence from the counterexample
3. **Make violations obvious**: Use clear assertions and error messages
4. **Add context**: Include comments explaining the property being violated
5. **Handle non-determinism**: Document any assumptions made when concretizing values
6. **Test the test**: Verify the generated test actually fails as expected

## Common Challenges

**Challenge**: Counterexample uses symbolic values without concrete bounds
**Solution**: Use representative values from the domain, document the choice

**Challenge**: Trace involves complex timing or scheduling
**Solution**: Use synchronization primitives or explicit delays to enforce ordering

**Challenge**: Program state is partially specified in counterexample
**Solution**: Initialize unspecified variables to default/neutral values

**Challenge**: Counterexample is very long
**Solution**: Identify the minimal prefix that still triggers the violation

## References

- [references/model_checker_formats.md](references/model_checker_formats.md): Detailed format specifications for common model checkers
- [assets/test_templates/](assets/test_templates/): Test framework templates for various languages

More from this repository

abstract-domain-explorerSkill

Applies abstract interpretation using different abstract domains (intervals, octagons, polyhedra, sign, congruence) to statically analyze program variables and infer invariants, value ranges, and relationships. Use when analyzing program properties, inferring loop invariants, detecting potential errors, or understanding variable relationships through static analysis.

abstract-invariant-generatorSkill

Uses abstract interpretation to automatically infer loop invariants, function preconditions, and postconditions for formal verification. Generates invariants that capture program behavior and support correctness proofs in Dafny, Isabelle, Coq, and other verification systems. Use when adding formal specifications to code, generating verification conditions, inferring contracts for functions, or discovering loop invariants for proofs.

abstract-state-analyzerSkill

Performs abstract interpretation over source code to infer possible program states, variable ranges, and data properties without executing the program. Reports potential runtime errors including out-of-bounds accesses, null dereferences, type inconsistencies, division by zero, and integer overflows. Use when analyzing code for potential runtime errors, performing static analysis, checking safety properties, or verifying program behavior without execution.

abstract-trace-summarizerSkill

Performs abstract interpretation to produce summarized execution traces and high-level program behavior representations. Highlights key control flow paths, variable relationships, loop invariants, function summaries, and potential runtime states using abstract domains (intervals, signs, nullness, etc.). Use when analyzing program behavior, understanding execution paths, computing loop invariants, tracking variable ranges, detecting potential runtime errors, or generating program summaries without concrete execution.

acsl-annotation-assistantSkill

Create ACSL (ANSI/ISO C Specification Language) formal annotations for C/C++ programs. Use this skill when working with formal verification, adding function contracts (requires/ensures), loop invariants, assertions, memory safety annotations, or any ACSL specifications. Supports Frama-C verification and generates comprehensive formal specifications for C/C++ code.

agent-browserSkill

CLI-based browser automation with persistent page state using ref-based element interaction. Use when users ask to navigate websites, interact with web pages, fill forms, take screenshots, test web applications, or extract information from web pages.

ambiguity-detectorSkill

Detects and analyzes ambiguous language in software requirements and user stories. Use when reviewing requirements documents, user stories, specifications, or any software requirement text to identify vague quantifiers, unclear scope, undefined terms, missing edge cases, subjective language, and incomplete specifications. Provides detailed analysis with clarifying questions and suggested improvements.

api-design-assistantSkill

Design and review APIs with suggestions for endpoints, parameters, return types, and best practices. Use when designing new APIs from requirements, reviewing existing API designs, generating API documentation, or getting implementation guidance. Supports REST APIs with focus on endpoint structure, request/response schemas, authentication, pagination, filtering, versioning, and OpenAPI specifications. Triggers when users ask to design, review, document, or improve APIs.