Skip to main content
ClaudeWave
Skill85 estrellas del repoactualizado 3mo ago

counterexample-generator

Generate concrete counterexamples when formal verification, assertions, or specifications fail. Use this skill when debugging failed proofs, understanding why verification fails, creating minimal reproducing examples, analyzing assertion violations, investigating invariant breaks, or diagnosing specification mismatches. Produces concrete input values, execution traces, and state information that demonstrate the failure.

Instalar en Claude Code
Copiar
git clone --depth 1 https://github.com/ArabelaTso/Skills-4-SE /tmp/counterexample-generator && cp -r /tmp/counterexample-generator/skills/counterexample-generator ~/.claude/skills/counterexample-generator
Después abre una sesión nueva de Claude Code; el skill carga automáticamente.

SKILL.md

# Counterexample Generator

Systematically generate counterexamples that demonstrate why verification fails. Provides concrete input values, execution traces, and diagnostic information to help understand and fix specification or implementation issues.

## Core Capabilities

### 1. Precondition Violation Detection

Generate inputs that violate preconditions:
- **Invalid parameter values** - Out-of-range inputs
- **Null/undefined references** - Missing required objects
- **State violations** - Invalid object states
- **Type mismatches** - Incorrect types
- **Constraint violations** - Broken business rules

### 2. Postcondition Failure Analysis

Find executions where postconditions fail:
- **Return value violations** - Wrong return values
- **State inconsistencies** - Incorrect final states
- **Invariant breaks** - Invariants violated after execution
- **Side effect errors** - Unexpected modifications
- **Exception failures** - Wrong or missing exceptions

### 3. Invariant Violation Discovery

Identify cases where invariants break:
- **Class invariant violations** - Object consistency broken
- **Loop invariant violations** - Invariant not maintained
- **Data structure violations** - Consistency broken
- **Temporal violations** - Time-based properties fail
- **Concurrency violations** - Race conditions exposed

### 4. Execution Trace Generation

Produce detailed execution paths:
- **Step-by-step traces** - Line-by-line execution
- **State snapshots** - Variable values at each step
- **Branch decisions** - Which paths taken
- **Call stacks** - Function call hierarchy
- **Symbolic execution** - Constraint-based paths

## Counterexample Generation Workflow

### Step 1: Identify the Failure

Understand what verification failed:

**From verification tools:**
```
Verification failed: Postcondition violated
Function: withdraw
Line: 15
Property: account.balance == old(account.balance) - amount
```

**From assertions:**
```python
AssertionError: assert result >= 0
  in function: sqrt
  with input: x = -1
```

**From test failures:**
```
Test failed: test_binary_search
Expected: 2
Actual: -1
```

**Analysis questions:**
- What property failed?
- Where did it fail?
- What was being verified?
- What are the inputs/state?

### Step 2: Analyze the Specification

Understand what should be true:

**Example specification:**
```python
def withdraw(account: Account, amount: float) -> float:
    """
    Preconditions:
        - account is not None
        - amount > 0
        - account.balance >= amount

    Postconditions:
        - account.balance == old(account.balance) - amount
        - result == account.balance
        - account.balance >= 0
    """
```

**Identify constraints:**
- Precondition: `account.balance >= amount`
- Postcondition: `account.balance == old(account.balance) - amount`
- Invariant: `account.balance >= 0`

### Step 3: Generate Counterexample Inputs

Find concrete values that cause failure:

**Manual generation:**
```python
# Try to violate postcondition
# If amount = 100, old balance = 50
# Then: 50 - 100 = -50 (violates balance >= 0 invariant!)

counterexample = {
    'account': Account(balance=50),
    'amount': 100
}
```

**Systematic generation:**
1. Start with boundary values
2. Try edge cases
3. Use constraint solving
4. Employ random testing
5. Apply mutation testing

### Step 4: Execute and Trace

Run the counterexample and capture details:

**Execution trace:**
```python
Initial state:
  account.balance = 50
  amount = 100

Step 1: Enter withdraw function
  Precondition check: account.balance >= amount?
  50 >= 100 → False

  Should raise error, but code doesn't check!

Step 2: Execute: account.balance -= amount
  account.balance = 50 - 100 = -50

Step 3: Return account.balance
  result = -50

Postcondition check:
  account.balance == old(account.balance) - amount?
  -50 == 50 - 100 → True (passes)

  account.balance >= 0?
  -50 >= 0 → False (FAILS!)

COUNTEREXAMPLE FOUND:
  Invariant violation: balance became negative
```

### Step 5: Present the Counterexample

Format for clarity and debugging:

**Counterexample report:**
```markdown
## Counterexample: Invariant Violation in withdraw()

### Failed Property
Invariant: account.balance >= 0

### Inputs
- account.balance = 50
- amount = 100

### Execution Trace
1. Initial: account.balance = 50
2. Check: balance >= amount → 50 >= 100 → False
   (Missing precondition enforcement!)
3. Execute: balance -= amount → balance = -50
4. Return: -50

### Why It Fails
The function doesn't enforce the precondition that
`account.balance >= amount`. When amount > balance,
the withdrawal proceeds anyway, violating the
invariant that balance must be non-negative.

### Root Cause
Missing validation:
```python
if account.balance < amount:
    raise InsufficientFundsError()
```

### Fix
Add precondition check before withdrawal.
```

## Counterexample Patterns

### Pattern 1: Boundary Value Violation

**Specification:**
```python
def sqrt(x: float) -> float:
    """
    Precondition: x >= 0
    Postcondition: result >= 0 and abs(result * result - x) < 1e-10
    """
    return x ** 0.5
```

**Verification failure:**
```
Postcondition violated when x < 0
```

**Counterexample:**
```python
# Counterexample
input: x = -1

# Execution:
result = (-1) ** 0.5 = (1.5707963267948966e-308+6.123233995736766e-17j)
# Returns complex number, not float!

# Why it fails:
# Python returns complex for sqrt of negative
# Violates postcondition: result should be float

# Trace:
Initial state: x = -1
Step 1: Compute (-1) ** 0.5
Step 2: Result is complex number
Step 3: Return complex (type violation)

# Fix needed:
if x < 0:
    raise ValueError("Cannot compute sqrt of negative number")
```

### Pattern 2: Off-By-One Error

**Specification:**
```python
def binary_search(arr: List[int], target: int) -> int:
    """
    Precondition: arr is sorted
    Postcondition:
        If result >= 0: arr[result] == target
        If result == -1: target not in arr
    """
    left,
abstract-domain-explorerSkill

Applies abstract interpretation using different abstract domains (intervals, octagons, polyhedra, sign, congruence) to statically analyze program variables and infer invariants, value ranges, and relationships. Use when analyzing program properties, inferring loop invariants, detecting potential errors, or understanding variable relationships through static analysis.

abstract-invariant-generatorSkill

Uses abstract interpretation to automatically infer loop invariants, function preconditions, and postconditions for formal verification. Generates invariants that capture program behavior and support correctness proofs in Dafny, Isabelle, Coq, and other verification systems. Use when adding formal specifications to code, generating verification conditions, inferring contracts for functions, or discovering loop invariants for proofs.

abstract-state-analyzerSkill

Performs abstract interpretation over source code to infer possible program states, variable ranges, and data properties without executing the program. Reports potential runtime errors including out-of-bounds accesses, null dereferences, type inconsistencies, division by zero, and integer overflows. Use when analyzing code for potential runtime errors, performing static analysis, checking safety properties, or verifying program behavior without execution.

abstract-trace-summarizerSkill

Performs abstract interpretation to produce summarized execution traces and high-level program behavior representations. Highlights key control flow paths, variable relationships, loop invariants, function summaries, and potential runtime states using abstract domains (intervals, signs, nullness, etc.). Use when analyzing program behavior, understanding execution paths, computing loop invariants, tracking variable ranges, detecting potential runtime errors, or generating program summaries without concrete execution.

acsl-annotation-assistantSkill

Create ACSL (ANSI/ISO C Specification Language) formal annotations for C/C++ programs. Use this skill when working with formal verification, adding function contracts (requires/ensures), loop invariants, assertions, memory safety annotations, or any ACSL specifications. Supports Frama-C verification and generates comprehensive formal specifications for C/C++ code.

agent-browserSkill

CLI-based browser automation with persistent page state using ref-based element interaction. Use when users ask to navigate websites, interact with web pages, fill forms, take screenshots, test web applications, or extract information from web pages.

ambiguity-detectorSkill

Detects and analyzes ambiguous language in software requirements and user stories. Use when reviewing requirements documents, user stories, specifications, or any software requirement text to identify vague quantifiers, unclear scope, undefined terms, missing edge cases, subjective language, and incomplete specifications. Provides detailed analysis with clarifying questions and suggested improvements.

api-design-assistantSkill

Design and review APIs with suggestions for endpoints, parameters, return types, and best practices. Use when designing new APIs from requirements, reviewing existing API designs, generating API documentation, or getting implementation guidance. Supports REST APIs with focus on endpoint structure, request/response schemas, authentication, pagination, filtering, versioning, and OpenAPI specifications. Triggers when users ask to design, review, document, or improve APIs.