Skill389 estrellas del repoactualizado 20d ago

benchmark-sweep

Benchmark Sweep systematically catalogs known solutions across a domain, maps their properties against problem dimensions in a matrix format, and identifies unexplored gaps where no solution currently exists. Use this skill when you need comprehensive coverage analysis of a solution space, want to discover unaddressed problem areas, or need to find opportunities for novel approaches within established domains.

Ver fuente Repositorio: de-anthropocentric-research-engine

Instalar en Claude Code

Copiar

git clone --depth 1 https://github.com/yogsoth-ai/de-anthropocentric-research-engine /tmp/benchmark-sweep && cp -r /tmp/benchmark-sweep/skills/benchmark-sweep ~/.claude/skills/benchmark-sweep

Después abre una sesión nueva de Claude Code; el skill carga automáticamente.

Definición

SKILL.md

# Benchmark Sweep

Systematically scan all known solutions in a domain, catalog their properties, and identify gaps where no solution exists.

## State Ledger

| Resource | Target | Current | % |
|----------|--------|---------|---|
| web-search | 30 | 0 | 0% |
| web-research | 10 | 0 | 0% |
| paper-overview | 30 | 0 | 0% |
| paper-search | 20 | 0 | 0% |
| paper-research | 8 | 0 | 0% |

## HARD-GATE

Cannot exit strategy until ≥80% of each budget line is consumed OR yield targets are met with justification for remaining budget.

## Available Tactics

| Tactic | Role |
|--------|------|
| coverage-analysis | Inventory → crossing → intersection evaluation pipeline |
| evaluation-filtering | Score and filter generated gap-filling ideas |

## Available SOPs

| SOP | Role |
|-----|------|
| benchmark-inventory | Catalog all known solutions with performance/applicability/limitations |
| method-problem-crossing | Build cross-reference matrix from inventory |
| intersection-evaluation | Annotate matrix cells as explored/partial/unexplored |
| enumeration-synthesis | Synthesize sweep findings into structured report |

## Execution Guidance

1. **Inventory**: Run benchmark-inventory to catalog all known methods
2. **Structure**: Use method-problem-crossing to organize into matrix form
3. **Evaluate**: Run intersection-evaluation to find gaps
4. **Generate**: For each gap, brainstorm potential solutions
5. **Filter**: Apply evaluation-filtering to rank gap-filling ideas
6. **Synthesize**: Produce final report via enumeration-synthesis

<!-- BEGIN available-tables (generated) -->

## Available Tactics

Optional, no fixed order; the final leaf is always a sop.

| Tactic | When to use |
| --- | --- |
| coverage-analysis | Systematic coverage evaluation pipeline — benchmark inventory, method-problem crossing, and intersection evaluation to map explored vs unexplored solution space. |

## Available SOPs

Optional, no fixed order; the final leaf is always a sop.

| SOP | When to use |
| --- | --- |
| creative-ideation-benchmark-inventory | Catalog all known solutions/methods in a domain with performance, applicability, and limitations. |
| enumeration-synthesis | Synthesize all systematic enumeration outputs into a structured idea report with prioritized recommendations. |
| intersection-evaluation | Evaluate exploration status of each cell in a method×problem matrix, annotating as explored, partial, or unexplored. |
| method-problem-crossing | Build method×problem cross-reference matrix showing which methods have been applied to which problems. |

<!-- END available-tables (generated) -->

Del mismo repositorio

formated-resultSkill

Experiment-specific - summarize the DARE executor's research design into a clean research_result report, forced to write back into the spec file produced by formated-specs.

formated-specsSkill

Experiment-specific - replaces writing-specs, emits DARE's 4-layer call plan as a clean research_graph schema. Last step forces load formated-result.

injection-fidelitySkill

loss-1 judge - read a sample's full dialogue and decide whether the user simulator semantically enacted its Policy Card. check-blind.

ladder-quality-orderSkill

loss-2 judge - pairwise quality comparison across the n rungs within one topic; decide monotonicity and endpoint separation. check-blind, D1-D5 only.

abductive-hypothesis-generationSkill

Strategy: Inference to the best explanation in the face of anomalies

ablation-brainstormSkill

Remove components one by one, observe system changes to reveal hidden

ablation-component-mappingSkill

Map system architecture to ablatable units for ablation studies

ablation-designSkill

Design ablation studies to isolate component contributions in ML systems