adversarial-escalation
The adversarial-escalation skill implements a structured debate framework that progressively intensifies intellectual challenges against a defended position. It begins with surface-level critiques of claims and evidence, then escalates to attacks on logical structure and coherence, and finally targets foundational assumptions and paradigm fit. The system measures defender resilience through confidence calibration to determine whether escalation proceeds or terminates. Use this skill when stress-testing arguments, validating reasoning robustness, or identifying conceptual vulnerabilities through systematic pressure.
git clone --depth 1 https://github.com/yogsoth-ai/de-anthropocentric-research-engine /tmp/adversarial-escalation && cp -r /tmp/adversarial-escalation/skills/adversarial-escalation ~/.claude/skills/adversarial-escalationSKILL.md
# Adversarial Escalation Strategy
Progressive pressure: escalate attack sophistication based on defender performance.
## Method
1. **debate-architect** designs escalation ladder (surface → structural → foundational)
2. Level 1: **debate-critic** probes surface claims and evidence quality
3. **confidence-calibration** measures defender resilience
4. Level 2: **debate-critic** attacks structural coherence and logical dependencies
5. Level 3: **debate-critic** challenges foundational assumptions and paradigm fit
6. Each level only reached if defender survives previous level
## Budget Table
| Parameter | S | M | L |
|---|---|---|---|
| Debate rounds | 4 | 8 | 12 |
| Participating agents | 3 | 5 | 8 |
| Coverage dimensions | 3 | 5 | 7 |
| External evidence searches | 2 | 5 | 10 |
## Orchestration
```
debate-architect → [design escalation ladder]
→ [for each level]:
debate-critic (level-appropriate attack)
→ debate-defender → debate-judge
→ confidence-calibration
→ (escalate if survived, terminate if collapsed)
→ debate-transcript-analysis → verdict-synthesis
```
## Subagents
- debate-architect (escalation design)
- debate-critic (multi-level attacks)
- debate-defender (responses)
- debate-judge (level adjudication)
- confidence-calibration (escalation trigger)Experiment-specific - summarize the DARE executor's research design into a clean research_result report, forced to write back into the spec file produced by formated-specs.
Experiment-specific - replaces writing-specs, emits DARE's 4-layer call plan as a clean research_graph schema. Last step forces load formated-result.
loss-1 judge - read a sample's full dialogue and decide whether the user simulator semantically enacted its Policy Card. check-blind.
loss-2 judge - pairwise quality comparison across the n rungs within one topic; decide monotonicity and endpoint separation. check-blind, D1-D5 only.
Strategy: 面对异常的最佳解释推理
Remove components one by one, observe system changes to reveal hidden dependencies and generate ideas from structural gaps.
Map system architecture to ablatable units for ablation studies
Design ablation studies to isolate component contributions in ML systems