Skill389 estrellas del repoactualizado 20d ago

benchmark-challenge

This skill identifies unstated assumptions embedded in industry benchmarks and best practices, then systematically negates them to expose hidden constraints and reveal unexplored possibilities. Use it when established standards may be limiting innovation or when you need to understand what foundational beliefs underpin conventional approaches in your field.

Ver fuente Repositorio: de-anthropocentric-research-engine

Instalar en Claude Code

Copiar

git clone --depth 1 https://github.com/yogsoth-ai/de-anthropocentric-research-engine /tmp/benchmark-challenge && cp -r /tmp/benchmark-challenge/skills/benchmark-challenge ~/.claude/skills/benchmark-challenge

Después abre una sesión nueva de Claude Code; el skill carga automáticamente.

Definición

SKILL.md

# Benchmark Challenge

Identify and negate benchmark assumptions.

## Execution

Subagent — spawned via subagent-spawning/spawn-agent skill.

## Why Subagent

Benchmark challenge requires deep analysis of why a best practice exists, what assumptions it encodes, and what happens when those assumptions are violated. Benefits from dedicated critical focus.

<!-- BEGIN available-tables (generated) -->

## Available SOPs

Optional, no fixed order; the final leaf is always a sop.

| SOP | When to use |
| --- | --- |
| spawn-agent | Spawn a customized CC subagent with full MCP tool access. Used by SOPs that declare execution: subagent. |

<!-- END available-tables (generated) -->

Del mismo repositorio

formated-resultSkill

Experiment-specific - summarize the DARE executor's research design into a clean research_result report, forced to write back into the spec file produced by formated-specs.

formated-specsSkill

Experiment-specific - replaces writing-specs, emits DARE's 4-layer call plan as a clean research_graph schema. Last step forces load formated-result.

injection-fidelitySkill

loss-1 judge - read a sample's full dialogue and decide whether the user simulator semantically enacted its Policy Card. check-blind.

ladder-quality-orderSkill

loss-2 judge - pairwise quality comparison across the n rungs within one topic; decide monotonicity and endpoint separation. check-blind, D1-D5 only.

abductive-hypothesis-generationSkill

Strategy: Inference to the best explanation in the face of anomalies

ablation-brainstormSkill

Remove components one by one, observe system changes to reveal hidden

ablation-component-mappingSkill

Map system architecture to ablatable units for ablation studies

ablation-designSkill

Design ablation studies to isolate component contributions in ML systems