anti-benchmark
The anti-benchmark Claude Code skill deconstructs industry best practices and performance standards to reveal their hidden assumptions and unexamined constraints. Use this skill when evaluating whether accepted benchmarks truly serve your goals or merely reflect inherited conventions, particularly before adopting standard methodologies or metrics in product development, research, or organizational processes.
git clone --depth 1 https://github.com/yogsoth-ai/de-anthropocentric-research-engine /tmp/anti-benchmark && cp -r /tmp/anti-benchmark/skills/anti-benchmark ~/.claude/skills/anti-benchmarkSKILL.md
# Anti-Benchmark Challenge industry best practices' hidden assumptions. ## State Ledger | Resource | Target | Current | % | |----------|--------|---------|---| | web-search | 25 | 0 | 0% | | web-research | 10 | 0 | 0% | | paper-overview | 25 | 0 | 0% | | paper-search | 15 | 0 | 0% | | paper-research | 5 | 0 | 0% | ## HARD-GATE Cannot exit strategy until ≥80% of each budget line is consumed OR yield targets are met with justification for remaining budget. ## Available Tactics | Tactic | Role | |--------|------| | assumption-enumeration | Surface assumptions hidden in benchmarks | ## Available SOPs | SOP | Role | |-----|------| | benchmark-challenge | Identify and negate benchmark assumptions | | sacred-cow-identification | Find unquestioned beliefs behind best practices | | assumption-perturbation | Test what happens when benchmark assumptions fail | | constructive-rebellion | Build alternatives that violate benchmarks constructively | | destruction-synthesis | Synthesize anti-benchmark outputs | ## Execution Guidance 1. **Identify benchmarks**: Catalog the industry best practices and standards in the domain 2. **Deconstruct**: Use benchmark-challenge to expose hidden assumptions in each 3. **Surface sacred cows**: Use sacred-cow-identification for deeper unquestioned beliefs 4. **Perturb**: Use assumption-perturbation to test "what if this standard is wrong?" 5. **Research alternatives**: Search for domains that succeed WITHOUT these benchmarks 6. **Build**: Use constructive-rebellion to form solutions that violate conventions productively 7. **Synthesize**: Produce structured output via destruction-synthesis
Experiment-specific - summarize the DARE executor's research design into a clean research_result report, forced to write back into the spec file produced by formated-specs.
Experiment-specific - replaces writing-specs, emits DARE's 4-layer call plan as a clean research_graph schema. Last step forces load formated-result.
loss-1 judge - read a sample's full dialogue and decide whether the user simulator semantically enacted its Policy Card. check-blind.
loss-2 judge - pairwise quality comparison across the n rungs within one topic; decide monotonicity and endpoint separation. check-blind, D1-D5 only.
Strategy: 面对异常的最佳解释推理
Remove components one by one, observe system changes to reveal hidden dependencies and generate ideas from structural gaps.
Map system architecture to ablatable units for ablation studies
Design ablation studies to isolate component contributions in ML systems