appropriateness-bounding
Appropriateness-bounding establishes acceptability boundaries for specific scenarios using structured expert consensus methods, primarily the RAND/UCLA Appropriateness Method or Consensus Conference protocols. It conducts two-round rating cycles where panelists rate indications on a 1–9 scale, discuss disagreements, and re-rate to classify outcomes as appropriate, uncertain, or inappropriate. Use this skill for medical guideline development, regulatory standard-setting, or determining acceptable thresholds for action across any domain requiring formal consensus-based judgment.
git clone --depth 1 https://github.com/yogsoth-ai/de-anthropocentric-research-engine /tmp/appropriateness-bounding && cp -r /tmp/appropriateness-bounding/skills/appropriateness-bounding ~/.claude/skills/appropriateness-boundingSKILL.md
# Appropriateness Bounding
**Purpose:** Determine what is appropriate, acceptable, or indicated for a given context. Uses the RAND/UCLA Appropriateness Method (rating + discussion + re-rating) or Consensus Conference (citizen jury) format to establish boundaries of acceptability.
**When to use:**
- Medical guideline development (appropriate indications)
- Regulatory standard setting
- Establishing acceptable thresholds for action
- Any question of the form "is X appropriate when Y?"
## Budget
| Parameter | Constraint |
|-----------|-----------|
| Rounds | 2 (rate → discuss → re-rate) |
| Perspectives | ≥4 (ideally 7–15 for RAND/UCLA) |
| Rating scale | 1–9 (inappropriate to appropriate) |
| Agreement threshold | Median ≥7 without disagreement |
## State Ledger
| Key | Type | Description |
|-----|------|-------------|
| indications | array | List of scenarios to rate |
| perspectives | array | Panel member perspectives |
| round_1_ratings | array | Initial ratings per indication |
| discussion_notes | string | Key points from discussion |
| round_2_ratings | array | Post-discussion ratings |
| classifications | object | Appropriate/uncertain/inappropriate per item |
## Available Tactics
- **iterative-convergence-round** — Two-round rate-discuss-rerate cycle
- **threshold-calibration** — Determine where appropriateness boundaries fall
## Available SOPs
- judgment-collection
- feedback-distribution
- consensus-measurement
- round-decision
- threshold-sweep
- consensus-classification
- consensus-synthesis
## Execution Guidance
1. Define indications/scenarios clearly (clinical scenarios, use cases)
2. Collect Round 1 ratings (1–9 scale) with brief rationale
3. Distribute feedback showing distribution of ratings
4. Facilitate structured discussion of disagreements
5. Collect Round 2 ratings
6. Classify each indication: appropriate (median 7–9), uncertain (4–6), inappropriate (1–3)
7. Flag items with disagreement (where panel lacks agreement despite median)
## Output Format
```yaml
classifications:
appropriate: [{indication, median, agreement_level}, ...]
uncertain: [{indication, median, agreement_level}, ...]
inappropriate: [{indication, median, agreement_level}, ...]
disagreement_items: [{indication, reason}, ...]
panel_size: <int>
method: RAND/UCLA | Consensus Conference
```Experiment-specific - summarize the DARE executor's research design into a clean research_result report, forced to write back into the spec file produced by formated-specs.
Experiment-specific - replaces writing-specs, emits DARE's 4-layer call plan as a clean research_graph schema. Last step forces load formated-result.
loss-1 judge - read a sample's full dialogue and decide whether the user simulator semantically enacted its Policy Card. check-blind.
loss-2 judge - pairwise quality comparison across the n rungs within one topic; decide monotonicity and endpoint separation. check-blind, D1-D5 only.
Strategy: 面对异常的最佳解释推理
Remove components one by one, observe system changes to reveal hidden dependencies and generate ideas from structural gaps.
Map system architecture to ablatable units for ablation studies
Design ablation studies to isolate component contributions in ML systems