adversarial-stress-testing
The adversarial-stress-testing Claude Code skill systematically tests the logical robustness of claims, hypotheses, and research designs by applying falsificationist methods including boundary value analysis, counterexample generation, and critical case selection. Use this skill when validating theoretical artifacts, experimental designs, or conceptual frameworks to identify logical breakpoints and map the conditions under which claims remain valid.
git clone --depth 1 https://github.com/yogsoth-ai/de-anthropocentric-research-engine /tmp/adversarial-stress-testing && cp -r /tmp/adversarial-stress-testing/skills/adversarial-stress-testing ~/.claude/skills/adversarial-stress-testingSKILL.md
# Adversarial Stress Testing **Core Question:** Does this artifact collapse under logical limits and boundary conditions? ## Methodology Sources - Lakatos (1976) — Proofs and Refutations: counterexample-driven refinement - Dutilh Novaes (2016) — Adversarial argumentation as dialogical practice - Clarke BVA — Boundary Value Analysis for systematic edge testing - Flyvbjerg (2006) — Critical case methodology: most-likely/least-likely selection - Popper (1959) — Falsificationism: seek conditions where claims break ## Strategy Routing | Artifact Type | Primary Strategy | Rationale | |---|---|---| | claim, hypothesis | assumption-negation | Direct logical attack | | gap, research-question | lakatos-heuristics | Counterexample refinement | | idea, approach | boundary-enumeration | Parameter space testing | | experiment-design | critical-case-design | Decisive test selection | | any (synthesis) | validity-envelope-mapping | Comprehensive envelope | ## Budget Table | Resource | S | M | L | |---|---|---|---| | Negation derivation chains | 3 | 6 | 10 | | Counterexamples/boundary cases | 5 | 12 | 25 | | Parameter dimensions | 3 | 6 | 10 | | Validity envelope dimensions | 2 | 4 | 6 | ## Tactics - contradiction-derivation — Negate, derive, detect contradiction - boundary-probing — Map parameter space, test extremes, find breakpoints - counterexample-heuristics — Generate monsters, bar or incorporate ## Context Management - Persist derivation chains and counterexamples across rounds - Track which negations produced genuine contradictions vs. benign outcomes - Accumulate validity envelope boundaries incrementally ## Output Produces `AdversarialStressReport` containing: identified breakpoints, validity envelope, surviving refined claims, and confidence assessment.
Experiment-specific - summarize the DARE executor's research design into a clean research_result report, forced to write back into the spec file produced by formated-specs.
Experiment-specific - replaces writing-specs, emits DARE's 4-layer call plan as a clean research_graph schema. Last step forces load formated-result.
loss-1 judge - read a sample's full dialogue and decide whether the user simulator semantically enacted its Policy Card. check-blind.
loss-2 judge - pairwise quality comparison across the n rungs within one topic; decide monotonicity and endpoint separation. check-blind, D1-D5 only.
Strategy: 面对异常的最佳解释推理
Remove components one by one, observe system changes to reveal hidden dependencies and generate ideas from structural gaps.
Map system architecture to ablatable units for ablation studies
Design ablation studies to isolate component contributions in ML systems