Skip to main content
ClaudeWave
Skill329 repo starsupdated 5d ago

injection-fidelity

The injection-fidelity skill evaluates whether a user simulator's dialogue authentically enacted its assigned Policy Card by examining semantic depth across five axes: substance demand (causal probing intensity), operationalization (specificity demands), legitimacy (coherence), corrigibility (premise holding), and generativity (novel contributions). It detects dialogue drift by comparing first and second halves, outputting a boolean fidelity verdict with per-axis evidence and a drift flag, operating strictly from dialogue and card content without reference to external detection frameworks.

Install in Claude Code
Copy
git clone --depth 1 https://github.com/yogsoth-ai/de-anthropocentric-research-engine /tmp/injection-fidelity && cp -r /tmp/injection-fidelity/self-iteration/2026-06-06-probe-pretrain/skills/injection-fidelity ~/.claude/skills/injection-fidelity
Then start a new Claude Code session; the skill loads automatically.

SKILL.md

# injection-fidelity (loss-1)

You receive: (1) a sample's full dialogue turns (de-identified, provided by jsonl_reader),
(2) the Policy Card that drove it (with axis_levels A1..A5,B1 + the two F8 phases).

Decide whether the simulator **semantically** acted out the card (not just word-frequency).
Check axis by axis:

- **A1 substance demand**: did the user genuinely interrogate causal mechanism (and refuse
  to let perfunctory answers slide)? Is the pushback real probing or surface questioning ->
  match against the expected intensity of card.A1's level.
- **A3 operationalization**: did the user demand numbers/thresholds/executable steps ->
  match A3's level.
- **A2 legitimacy**: were the requests coherent and on-topic -> match A2's level.
- **A4 corrigibility** (if C-): did the user hold the wrong premise throughout, never relent.
- **A5 generativity** (if G+): did the user throw out substantive novel seeds (not a
  restatement of the assistant's content).
- **Drift gate**: first half vs second half of the dialogue, did the pressure signal stay
  in-level (guard against the simulator drifting back to over-cooperation).

## Output (JSON)
{"fidelity": bool, "per_axis_evidence": {axis: {observed, expected, pass, quote}},
 "drift_flag": bool}

## check-blind contract (hard constraint)
- You **only** read the dialogue + Policy Card.
- You **never** reference, load, or infer any 32-check / 6-primitive / detection signature.
- You only judge "was the card enacted", never "is the research good".