attack-resilience-scoring
This Claude Code skill computes a quantitative resilience score (0.0-1.0) for artifacts evaluated through red team testing by analyzing aggregated vulnerabilities, coverage metrics, and severity distribution across logical, empirical, methodological, and practical dimensions. Use it when you need an objective, bias-independent assessment of how well a system withstands identified attacks relative to test coverage.
git clone --depth 1 https://github.com/yogsoth-ai/de-anthropocentric-research-engine /tmp/attack-resilience-scoring && cp -r /tmp/attack-resilience-scoring/skills/attack-resilience-scoring ~/.claude/skills/attack-resilience-scoringSKILL.md
# Attack Resilience Scoring Computes a quantitative resilience score for the artifact based on red team results. ## Execution Subagent — spawned via subagent-spawning/spawn-agent. ## Why Subagent Scoring requires calibrated judgment independent of attack or defense bias. The scorer must weigh findings objectively against coverage. ## Input - **aggregated_findings**: Deduplicated vulnerability report from finding-aggregation - **coverage_data**: What percentage of threat surfaces were tested, at what depth ## Output - **resilience_score**: 0.0-1.0 overall score - **dimension_scores**: Per-dimension breakdown (logical, empirical, methodological, practical) - **confidence_in_score**: How much to trust the score given coverage gaps - **verdict**: Pass/conditional-pass/fail with justification
Experiment-specific - summarize the DARE executor's research design into a clean research_result report, forced to write back into the spec file produced by formated-specs.
Experiment-specific - replaces writing-specs, emits DARE's 4-layer call plan as a clean research_graph schema. Last step forces load formated-result.
loss-1 judge - read a sample's full dialogue and decide whether the user simulator semantically enacted its Policy Card. check-blind.
loss-2 judge - pairwise quality comparison across the n rungs within one topic; decide monotonicity and endpoint separation. check-blind, D1-D5 only.
Strategy: 面对异常的最佳解释推理
Remove components one by one, observe system changes to reveal hidden dependencies and generate ideas from structural gaps.
Map system architecture to ablatable units for ablation studies
Design ablation studies to isolate component contributions in ML systems