injection-fidelity
The injection-fidelity skill evaluates whether a user simulator's dialogue authentically enacted its assigned Policy Card by examining semantic depth across five axes: substance demand (causal probing intensity), operationalization (specificity demands), legitimacy (coherence), corrigibility (premise holding), and generativity (novel contributions). It detects dialogue drift by comparing first and second halves, outputting a boolean fidelity verdict with per-axis evidence and a drift flag, operating strictly from dialogue and card content without reference to external detection frameworks.
git clone --depth 1 https://github.com/yogsoth-ai/de-anthropocentric-research-engine /tmp/injection-fidelity && cp -r /tmp/injection-fidelity/self-iteration/2026-06-06-probe-pretrain/skills/injection-fidelity ~/.claude/skills/injection-fidelitySKILL.md
# injection-fidelity (loss-1)
You receive: (1) a sample's full dialogue turns (de-identified, provided by jsonl_reader),
(2) the Policy Card that drove it (with axis_levels A1..A5,B1 + the two F8 phases).
Decide whether the simulator **semantically** acted out the card (not just word-frequency).
Check axis by axis:
- **A1 substance demand**: did the user genuinely interrogate causal mechanism (and refuse
to let perfunctory answers slide)? Is the pushback real probing or surface questioning ->
match against the expected intensity of card.A1's level.
- **A3 operationalization**: did the user demand numbers/thresholds/executable steps ->
match A3's level.
- **A2 legitimacy**: were the requests coherent and on-topic -> match A2's level.
- **A4 corrigibility** (if C-): did the user hold the wrong premise throughout, never relent.
- **A5 generativity** (if G+): did the user throw out substantive novel seeds (not a
restatement of the assistant's content).
- **Drift gate**: first half vs second half of the dialogue, did the pressure signal stay
in-level (guard against the simulator drifting back to over-cooperation).
## Output (JSON)
{"fidelity": bool, "per_axis_evidence": {axis: {observed, expected, pass, quote}},
"drift_flag": bool}
## check-blind contract (hard constraint)
- You **only** read the dialogue + Policy Card.
- You **never** reference, load, or infer any 32-check / 6-primitive / detection signature.
- You only judge "was the card enacted", never "is the research good".Experiment-specific - summarize the DARE executor's research design into a clean research_result report, forced to write back into the spec file produced by formated-specs.
Experiment-specific - replaces writing-specs, emits DARE's 4-layer call plan as a clean research_graph schema. Last step forces load formated-result.
loss-2 judge - pairwise quality comparison across the n rungs within one topic; decide monotonicity and endpoint separation. check-blind, D1-D5 only.
Strategy: 面对异常的最佳解释推理
Remove components one by one, observe system changes to reveal hidden dependencies and generate ideas from structural gaps.
Map system architecture to ablatable units for ablation studies
Design ablation studies to isolate component contributions in ML systems
Remove components one by one from a system, record the response/impact of each removal.