Skill389 estrellas del repoactualizado 19d ago

adversarial-stress-testing

The adversarial-stress-testing Claude Code skill systematically tests the logical robustness of claims, hypotheses, and research designs by applying falsificationist methods including boundary value analysis, counterexample generation, and critical case selection. Use this skill when validating theoretical artifacts, experimental designs, or conceptual frameworks to identify logical breakpoints and map the conditions under which claims remain valid.

Ver fuente Repositorio: de-anthropocentric-research-engine

Instalar en Claude Code

Copiar

git clone --depth 1 https://github.com/yogsoth-ai/de-anthropocentric-research-engine /tmp/adversarial-stress-testing && cp -r /tmp/adversarial-stress-testing/skills/adversarial-stress-testing ~/.claude/skills/adversarial-stress-testing

Después abre una sesión nueva de Claude Code; el skill carga automáticamente.

Definición

SKILL.md

# Adversarial Stress Testing

**Core Question:** Does this artifact collapse under logical limits and boundary conditions?

## Methodology Sources

- Lakatos (1976) — Proofs and Refutations: counterexample-driven refinement
- Dutilh Novaes (2016) — Adversarial argumentation as dialogical practice
- Clarke BVA — Boundary Value Analysis for systematic edge testing
- Flyvbjerg (2006) — Critical case methodology: most-likely/least-likely selection
- Popper (1959) — Falsificationism: seek conditions where claims break

## Strategy Routing

| Artifact Type | Primary Strategy | Rationale |
|---|---|---|
| claim, hypothesis | assumption-negation | Direct logical attack |
| gap, research-question | lakatos-heuristics | Counterexample refinement |
| idea, approach | boundary-enumeration | Parameter space testing |
| experiment-design | critical-case-design | Decisive test selection |
| any (synthesis) | validity-envelope-mapping | Comprehensive envelope |

## Budget Table

| Resource | S | M | L |
|---|---|---|---|
| Negation derivation chains | 3 | 6 | 10 |
| Counterexamples/boundary cases | 5 | 12 | 25 |
| Parameter dimensions | 3 | 6 | 10 |
| Validity envelope dimensions | 2 | 4 | 6 |

## Tactics

- contradiction-derivation — Negate, derive, detect contradiction
- boundary-probing — Map parameter space, test extremes, find breakpoints
- counterexample-heuristics — Generate monsters, bar or incorporate

## Context Management

- Persist derivation chains and counterexamples across rounds
- Track which negations produced genuine contradictions vs. benign outcomes
- Accumulate validity envelope boundaries incrementally

## Output

Produces `AdversarialStressReport` containing: identified breakpoints, validity envelope, surviving refined claims, and confidence assessment.

<!-- BEGIN available-tables (generated) -->

## Available Strategies

Optional, no fixed order; the final leaf is always a sop.

| Strategy | When to use |
| --- | --- |
| assumption-negation | Classic reductio ad absurdum: negate the core claim, derive logical consequences, seek contradiction or absurdity. |
| boundary-enumeration | Systematic Boundary Value Analysis: identify parameter boundaries, test at and beyond limits, detect breakpoints. |
| critical-case-design | Flyvbjerg critical case methodology: select most-likely and least-likely cases to maximize inferential power. |
| lakatos-heuristics | Proofs and Refutations method: generate counterexamples, attempt monster-barring, incorporate surviving counterexamples as lemma refinements. |
| stress-test-validity-envelope-mapping | Map the complete validity envelope of a claim across all relevant dimensions, synthesizing breakpoints into a bounded region. |

## Available Tactics

Optional, no fixed order; the final leaf is always a sop.

| Tactic | When to use |
| --- | --- |
| boundary-probing | Map parameter space, generate extreme values, test at boundaries, detect breakpoints, synthesize validity envelope. |
| contradiction-derivation | Negate a claim, derive logical consequences step by step, detect whether a genuine contradiction or absurdity emerges. |
| counterexample-heuristics | Generate counterexamples (monsters), attempt monster-barring, incorporate surviving counterexamples as lemma refinements (Lakatos method). |

## Available SOPs

Optional, no fixed order; the final leaf is always a sop.

| SOP | When to use |
| --- | --- |
| context-checkpoint | Append research process and results to the current Phase's context file. Each append MUST contain >=500 lines of markdown covering both process and results. Use this skill at plan-designated checkpoint points — typically after each strategy completes or at key decision nodes within a research Phase. |
| context-init | Create a new context file for a research Phase. Called once at Phase start to initialize the file that subsequent context-checkpoint calls will append to. Use this skill whenever a new research Phase begins and a fresh context file is needed. |
| mitigation-proposal | Proposes concrete mitigation strategies for identified weaknesses. Generates prevention, detection, and response measures with feasibility assessment. |
| stress-test-saturation-detection | Determines whether validation has reached saturation — no new weaknesses or failure modes being discovered. Used by all 5 campaigns as termination signal. |
| verdict-synthesis | Synthesizes findings from a completed campaign into typed verdict reports. Produces DebateVerdict, RedTeamReport, FailureAnticipationReport, CounterfactualMap, or AdversarialStressReport depending on campaign. Also supports cross-campaign StressTestSummary. |
| weakness-classification | Classifies discovered weaknesses into severity tiers (fatal/major/minor/cosmetic) with structured justification and exploitability assessment. |

<!-- END available-tables (generated) -->

Del mismo repositorio

formated-resultSkill

Experiment-specific - summarize the DARE executor's research design into a clean research_result report, forced to write back into the spec file produced by formated-specs.

formated-specsSkill

Experiment-specific - replaces writing-specs, emits DARE's 4-layer call plan as a clean research_graph schema. Last step forces load formated-result.

injection-fidelitySkill

loss-1 judge - read a sample's full dialogue and decide whether the user simulator semantically enacted its Policy Card. check-blind.

ladder-quality-orderSkill

loss-2 judge - pairwise quality comparison across the n rungs within one topic; decide monotonicity and endpoint separation. check-blind, D1-D5 only.

abductive-hypothesis-generationSkill

Strategy: Inference to the best explanation in the face of anomalies

ablation-brainstormSkill

Remove components one by one, observe system changes to reveal hidden

ablation-component-mappingSkill

Map system architecture to ablatable units for ablation studies

ablation-designSkill

Design ablation studies to isolate component contributions in ML systems