Skill139 repo starsupdated 5mo ago

counterexample-debugger

The counterexample-debugger skill analyzes proof failures in formal verification systems by interpreting counterexamples generated from Nitpick (Isabelle/HOL) or QuickChick (Coq). Use it when automated proofs fail to identify missing preconditions, specification errors, incorrect quantifier ordering, or incorrect theorem statements. The skill systematically verifies counterexamples, determines root causes of failures, and recommends specific corrections with code examples for both proof assistants.

View source Repository: Skills-4-SE

Install in Claude Code

Copy

git clone --depth 1 https://github.com/ArabelaTso/Skills-4-SE /tmp/counterexample-debugger && cp -r /tmp/counterexample-debugger/skills/counterexample-debugger ~/.claude/skills/counterexample-debugger

Then start a new Claude Code session; the skill loads automatically.

Definition

SKILL.md

# Counterexample-Guided Proof Debugger

Analyze counterexamples from Nitpick or QuickChick to explain proof failures and suggest corrections to specifications or proofs.

## Workflow

### 1. Receive Counterexample Information

Identify what information is provided:
- **Counterexample output**: From Nitpick or QuickChick
- **Failed theorem**: The statement that couldn't be proven
- **Proof attempt**: Any partial proof or tactics tried
- **Context**: Definitions and lemmas involved

### 2. Choose Target System

Determine which proof assistant is being used:
- **Isabelle/HOL with Nitpick**: Finite model finder
- **Coq with QuickChick**: Property-based testing
- **Both**: Provide analysis for both systems

### 3. Analyze the Counterexample

Examine the counterexample systematically:

**Verify the counterexample**:
- Manually compute the result for the given values
- Confirm it actually violates the theorem
- Check if it's a genuine counterexample or tool limitation

**Identify the violation**:
- Which part of the theorem fails?
- What values cause the failure?
- Is it an edge case or fundamental issue?

**Determine the root cause**:
- Missing precondition?
- Incorrect specification?
- Wrong quantifier order?
- Implementation bug?
- Off-by-one error?
- Type constraint issue?

### 4. Explain the Failure

Provide clear explanation:

**What went wrong**:
- Describe why the counterexample violates the theorem
- Show the computation step-by-step
- Highlight the specific point of failure

**Why it happened**:
- Explain the underlying cause
- Identify the conceptual error
- Note any common patterns (empty list, boundary values, etc.)

**Impact assessment**:
- Is the theorem fundamentally wrong?
- Does it need preconditions?
- Is the specification incomplete?

### 5. Suggest Corrections

Provide actionable fixes based on the root cause:

**For missing preconditions**:
```isabelle
(* Before *)
lemma "hd xs ∈ set xs"

(* After *)
lemma "xs ≠ [] ⟹ hd xs ∈ set xs"
```

**For incorrect specifications**:
```coq
(* Before: uses < instead of <= *)
x < y && is_sorted (y :: ys)

(* After *)
x <= y && is_sorted (y :: ys)
```

**For quantifier order**:
```isabelle
(* Before *)
"∃y. ∀x. P x y"

(* After *)
"∀x. ∃y. P x y"
```

**For incomplete specifications**:
```coq
(* Before: only checks sortedness *)
is_sorted (sort l)

(* After: also checks permutation *)
is_sorted (sort l) && permutation l (sort l)
```

### 6. Recommend Next Steps

Guide the user on what to do:

**Retest with fix**:
- Run Nitpick/QuickChick again
- Verify no counterexample found
- Check if fix is sufficient

**Identify additional issues**:
- Are there other edge cases?
- Do other lemmas need fixing?
- Is the specification now complete?

**Proceed with proof**:
- If no counterexample, attempt proof
- Suggest proof strategy
- Identify needed helper lemmas

## Counterexample Analysis Patterns

### Pattern 1: Empty Structures

**Symptom**: Counterexample is `[]`, `{}`, or `None`

**Common causes**:
- Missing non-empty precondition
- Undefined behavior on empty input
- Base case not handled

**Fix**: Add precondition or handle empty case explicitly

### Pattern 2: Boundary Values

**Symptom**: Counterexample is `0`, `1`, or type limits

**Common causes**:
- Off-by-one errors
- Boundary condition not considered
- Edge case in arithmetic

**Fix**: Adjust bounds or add special case handling

### Pattern 3: Duplicate Elements

**Symptom**: Counterexample has repeated values like `[0, 0]`

**Common causes**:
- Using `<` instead of `≤`
- Assuming distinctness
- Set vs. multiset confusion

**Fix**: Use appropriate comparison or add distinctness assumption

### Pattern 4: Small Counterexamples

**Symptom**: Very small counterexample (1-2 elements)

**Common causes**:
- Fundamental specification error
- Wrong base case
- Incorrect inductive step

**Fix**: Review base definitions and inductive structure

### Pattern 5: Type-Specific Values

**Symptom**: Counterexample at type boundaries

**Common causes**:
- Type constraints not considered
- Overflow/underflow issues
- Finite vs. infinite types

**Fix**: Add type constraints or adjust specification

## Tool-Specific Guidance

### Nitpick (Isabelle/HOL)

For detailed Nitpick usage and interpretation:
- See [nitpick_guide.md](references/nitpick_guide.md)

Key points:
- Searches for finite models
- Configurable cardinality bounds
- May miss counterexamples beyond bounds
- "No counterexample" ≠ proof

### QuickChick (Coq)

For detailed QuickChick usage and interpretation:
- See [quickchick_guide.md](references/quickchick_guide.md)

Key points:
- Random testing with shrinking
- Configurable test count
- May miss rare counterexamples
- "Success" ≠ proof

## Common Root Causes

### Specification Errors

**Symptoms**:
- Counterexample shows spec doesn't match intent
- Multiple unrelated counterexamples
- Spec too weak or too strong

**Fixes**:
- Strengthen postconditions
- Add completeness requirements
- Review specification against intent

### Missing Preconditions

**Symptoms**:
- Counterexample is edge case
- Empty structures or boundary values
- Undefined behavior

**Fixes**:
- Add non-empty constraints
- Add type bounds
- Add well-formedness conditions

### Quantifier Issues

**Symptoms**:
- Counterexample shows wrong order
- Existential/universal confusion
- Skolem constant issues

**Fixes**:
- Swap quantifier order
- Review logical structure
- Check variable dependencies

### Implementation Bugs

**Symptoms**:
- Counterexample shows function doesn't work
- Output doesn't match specification
- Logic error in definition

**Fixes**:
- Fix the implementation
- Review algorithm correctness
- Test implementation separately

## Debugging Checklist

When analyzing a counterexample:

1. **Verify manually**: Compute the result for the counterexample values
2. **Identify violation**: Which part of the theorem fails?
3. **Find root cause**: Why does it fail? (precondition, spec, impl, quantifiers)
4. **Suggest fix**:

More from this repository

abstract-domain-explorerSkill

Applies abstract interpretation using different abstract domains (intervals, octagons, polyhedra, sign, congruence) to statically analyze program variables and infer invariants, value ranges, and relationships. Use when analyzing program properties, inferring loop invariants, detecting potential errors, or understanding variable relationships through static analysis.

abstract-invariant-generatorSkill

Uses abstract interpretation to automatically infer loop invariants, function preconditions, and postconditions for formal verification. Generates invariants that capture program behavior and support correctness proofs in Dafny, Isabelle, Coq, and other verification systems. Use when adding formal specifications to code, generating verification conditions, inferring contracts for functions, or discovering loop invariants for proofs.

abstract-state-analyzerSkill

Performs abstract interpretation over source code to infer possible program states, variable ranges, and data properties without executing the program. Reports potential runtime errors including out-of-bounds accesses, null dereferences, type inconsistencies, division by zero, and integer overflows. Use when analyzing code for potential runtime errors, performing static analysis, checking safety properties, or verifying program behavior without execution.

abstract-trace-summarizerSkill

Performs abstract interpretation to produce summarized execution traces and high-level program behavior representations. Highlights key control flow paths, variable relationships, loop invariants, function summaries, and potential runtime states using abstract domains (intervals, signs, nullness, etc.). Use when analyzing program behavior, understanding execution paths, computing loop invariants, tracking variable ranges, detecting potential runtime errors, or generating program summaries without concrete execution.

acsl-annotation-assistantSkill

Create ACSL (ANSI/ISO C Specification Language) formal annotations for C/C++ programs. Use this skill when working with formal verification, adding function contracts (requires/ensures), loop invariants, assertions, memory safety annotations, or any ACSL specifications. Supports Frama-C verification and generates comprehensive formal specifications for C/C++ code.

agent-browserSkill

CLI-based browser automation with persistent page state using ref-based element interaction. Use when users ask to navigate websites, interact with web pages, fill forms, take screenshots, test web applications, or extract information from web pages.

ambiguity-detectorSkill

Detects and analyzes ambiguous language in software requirements and user stories. Use when reviewing requirements documents, user stories, specifications, or any software requirement text to identify vague quantifiers, unclear scope, undefined terms, missing edge cases, subjective language, and incomplete specifications. Provides detailed analysis with clarifying questions and suggested improvements.

api-design-assistantSkill

Design and review APIs with suggestions for endpoints, parameters, return types, and best practices. Use when designing new APIs from requirements, reviewing existing API designs, generating API documentation, or getting implementation guidance. Supports REST APIs with focus on endpoint structure, request/response schemas, authentication, pagination, filtering, versioning, and OpenAPI specifications. Triggers when users ask to design, review, document, or improve APIs.