Skill389 estrellas del repoactualizado 19d ago

ladder-quality-order

The ladder-quality-order skill performs pairwise quality comparisons of research designs within a topic using five substantive dimensions (D1–D5: meaningfulness, skill-research value, DARE usability, layer respect, and prerequisite firmness). It ranks n samples against an intended quality order, outputs a Kendall tau correlation score and endpoint stability metrics, and flags when top and bottom designs become indistinguishable. Use this to validate whether interpolated research quality gradients maintain monotonic separation and resist framing-based confounds.

Ver fuente Repositorio: de-anthropocentric-research-engine

Instalar en Claude Code

Copiar

git clone --depth 1 https://github.com/yogsoth-ai/de-anthropocentric-research-engine /tmp/ladder-quality-order && cp -r /tmp/ladder-quality-order/self-iteration/2026-06-06-probe-pretrain/skills/ladder-quality-order ~/.claude/skills/ladder-quality-order

Después abre una sesión nueva de Claude Code; el skill carga automáticamente.

Definición

SKILL.md

# ladder-quality-order (loss-2)

You receive: the n samples under one topic, each with (research_graph, research_result),
plus their intended_order (the id order from the interpolator: id0 should be best ->
idN-1 should be worst).

## Task (pairwise ranking, no absolute scores)
1. Enumerate all i<j pairs; for each pair ask: **in the D1-D5 sense, which research design
   is more substantive?** (D1 more meaningful / D2 more skill-research value / D3 more
   usable to DARE / D4 better respects the 4 layers / D5 firmer prerequisites). Output
   winner + a one-line reason.
2. Aggregate into an induced order; compute Kendall tau against intended_order.
3. Endpoints: directly compare id0 vs idN-1; across K repeats, check whether id0 wins stably.

## Output (JSON)
{"tau": float, "monotonicity_pass": bool,   // tau>=0.7 and no endpoint inversion
 "endpoint_separation_pass": bool,          // id0 wins >= K-allowance of K repeats
 "rigor_floor_flag": bool,                  // if id0 ~ idN-1 endpoints collapse (feed risk register)
 "pairwise_log": [{i,j,winner,reason}]}

## check-blind contract (hard constraint)
- The judge prompt may use **only** D1-D5 wording.
- **Forbidden**: 32-check vocabulary, 6-primitive, "pseudo-good/novel-good" categories,
  any detection signature.
- z-perp-C: on the B1 confound triplet (same substance, different framing) your order
  **must stay invariant**; if it varies with framing -> you were dragged by the confound,
  tighten back to D1-D5 substance.

Del mismo repositorio

formated-resultSkill

Experiment-specific - summarize the DARE executor's research design into a clean research_result report, forced to write back into the spec file produced by formated-specs.

formated-specsSkill

Experiment-specific - replaces writing-specs, emits DARE's 4-layer call plan as a clean research_graph schema. Last step forces load formated-result.

injection-fidelitySkill

loss-1 judge - read a sample's full dialogue and decide whether the user simulator semantically enacted its Policy Card. check-blind.

abductive-hypothesis-generationSkill

Strategy: Inference to the best explanation in the face of anomalies

ablation-brainstormSkill

Remove components one by one, observe system changes to reveal hidden

ablation-component-mappingSkill

Map system architecture to ablatable units for ablation studies

ablation-designSkill

Design ablation studies to isolate component contributions in ML systems

ablation-executionSkill

Remove components one by one from a system, record the response/impact