benchmark-challenge
This skill identifies unstated assumptions embedded in industry benchmarks and best practices, then systematically negates them to expose hidden constraints and reveal unexplored possibilities. Use it when established standards may be limiting innovation or when you need to understand what foundational beliefs underpin conventional approaches in your field.
git clone --depth 1 https://github.com/yogsoth-ai/de-anthropocentric-research-engine /tmp/benchmark-challenge && cp -r /tmp/benchmark-challenge/skills/benchmark-challenge ~/.claude/skills/benchmark-challengeSKILL.md
# Benchmark Challenge Identify and negate benchmark assumptions. ## Execution Subagent — spawned via subagent-spawning/spawn-agent skill. ## Why Subagent Benchmark challenge requires deep analysis of why a best practice exists, what assumptions it encodes, and what happens when those assumptions are violated. Benefits from dedicated critical focus.
Experiment-specific - summarize the DARE executor's research design into a clean research_result report, forced to write back into the spec file produced by formated-specs.
Experiment-specific - replaces writing-specs, emits DARE's 4-layer call plan as a clean research_graph schema. Last step forces load formated-result.
loss-1 judge - read a sample's full dialogue and decide whether the user simulator semantically enacted its Policy Card. check-blind.
loss-2 judge - pairwise quality comparison across the n rungs within one topic; decide monotonicity and endpoint separation. check-blind, D1-D5 only.
Strategy: 面对异常的最佳解释推理
Remove components one by one, observe system changes to reveal hidden dependencies and generate ideas from structural gaps.
Map system architecture to ablatable units for ablation studies
Design ablation studies to isolate component contributions in ML systems