adversarial-roleplay
The adversarial-roleplay Claude Code skill systematically tests artifacts for vulnerabilities by constructing detailed hostile personas with specific motivations and expertise domains, then executing coordinated attacks from each persona's perspective while tracking successful attack vectors. Use this skill when conducting thorough security assessments or red-team evaluations that require identifying convergent weaknesses across multiple adversarial approaches and threat models.
git clone --depth 1 https://github.com/yogsoth-ai/de-anthropocentric-research-engine /tmp/adversarial-roleplay && cp -r /tmp/adversarial-roleplay/skills/adversarial-roleplay ~/.claude/skills/adversarial-roleplaySKILL.md
# Adversarial Roleplay Tactic Deploy constructed hostile personas to attack the artifact from distinct motivational frames. ## Orchestration 1. **persona-construction** builds detailed adversary profile: - Background and expertise domain - Motivation for attacking (career incentive, resource competition, ideological) - Known blind spots and biases of this persona type - Preferred attack patterns 2. **attack-vector-generation** generates vectors specific to persona's expertise and motivation 3. **probe-execution** executes attacks while maintaining persona consistency 4. Successful attack paths recorded with persona attribution 5. Process repeats for each persona (budget-limited) 6. **finding-aggregation** cross-references findings across personas for convergent vulnerabilities ## Subagents Dispatched - persona-construction (1 call per persona) - attack-vector-generation (1 call per persona) - probe-execution (N calls per persona, budget-limited) - finding-aggregation (1 call at end, cross-persona) ## Termination Conditions - All budgeted personas deployed and exhausted - Convergent vulnerability found by 2+ personas (high-confidence finding) - Single persona finds critical vulnerability (early report) - Budget exhausted (report per-persona findings separately)
Experiment-specific - summarize the DARE executor's research design into a clean research_result report, forced to write back into the spec file produced by formated-specs.
Experiment-specific - replaces writing-specs, emits DARE's 4-layer call plan as a clean research_graph schema. Last step forces load formated-result.
loss-1 judge - read a sample's full dialogue and decide whether the user simulator semantically enacted its Policy Card. check-blind.
loss-2 judge - pairwise quality comparison across the n rungs within one topic; decide monotonicity and endpoint separation. check-blind, D1-D5 only.
Strategy: 面对异常的最佳解释推理
Remove components one by one, observe system changes to reveal hidden dependencies and generate ideas from structural gaps.
Map system architecture to ablatable units for ablation studies
Design ablation studies to isolate component contributions in ML systems