benchmark-sweep
Benchmark Sweep systematically catalogs known solutions across a domain, maps their properties against problem dimensions in a matrix format, and identifies unexplored gaps where no solution currently exists. Use this skill when you need comprehensive coverage analysis of a solution space, want to discover unaddressed problem areas, or need to find opportunities for novel approaches within established domains.
git clone --depth 1 https://github.com/yogsoth-ai/de-anthropocentric-research-engine /tmp/benchmark-sweep && cp -r /tmp/benchmark-sweep/skills/benchmark-sweep ~/.claude/skills/benchmark-sweepSKILL.md
# Benchmark Sweep Systematically scan all known solutions in a domain, catalog their properties, and identify gaps where no solution exists. ## State Ledger | Resource | Target | Current | % | |----------|--------|---------|---| | web-search | 30 | 0 | 0% | | web-research | 10 | 0 | 0% | | paper-overview | 30 | 0 | 0% | | paper-search | 20 | 0 | 0% | | paper-research | 8 | 0 | 0% | ## HARD-GATE Cannot exit strategy until ≥80% of each budget line is consumed OR yield targets are met with justification for remaining budget. ## Available Tactics | Tactic | Role | |--------|------| | coverage-analysis | Inventory → crossing → intersection evaluation pipeline | | evaluation-filtering | Score and filter generated gap-filling ideas | ## Available SOPs | SOP | Role | |-----|------| | benchmark-inventory | Catalog all known solutions with performance/applicability/limitations | | method-problem-crossing | Build cross-reference matrix from inventory | | intersection-evaluation | Annotate matrix cells as explored/partial/unexplored | | enumeration-synthesis | Synthesize sweep findings into structured report | ## Execution Guidance 1. **Inventory**: Run benchmark-inventory to catalog all known methods 2. **Structure**: Use method-problem-crossing to organize into matrix form 3. **Evaluate**: Run intersection-evaluation to find gaps 4. **Generate**: For each gap, brainstorm potential solutions 5. **Filter**: Apply evaluation-filtering to rank gap-filling ideas 6. **Synthesize**: Produce final report via enumeration-synthesis
Experiment-specific - summarize the DARE executor's research design into a clean research_result report, forced to write back into the spec file produced by formated-specs.
Experiment-specific - replaces writing-specs, emits DARE's 4-layer call plan as a clean research_graph schema. Last step forces load formated-result.
loss-1 judge - read a sample's full dialogue and decide whether the user simulator semantically enacted its Policy Card. check-blind.
loss-2 judge - pairwise quality comparison across the n rungs within one topic; decide monotonicity and endpoint separation. check-blind, D1-D5 only.
Strategy: 面对异常的最佳解释推理
Remove components one by one, observe system changes to reveal hidden dependencies and generate ideas from structural gaps.
Map system architecture to ablatable units for ablation studies
Design ablation studies to isolate component contributions in ML systems