ahrq-picme-assessment
This Claude Code skill systematically evaluates research gaps using the AHRQ PiCMe framework across six dimensions: Population, Intervention, Comparator, Metrics, Evidence, and overall verdict. Use it when you need to assess the clarity, feasibility, and quality of identified research gaps before prioritizing them for further investigation, with mandatory 1-5 scoring on each dimension and a final strength rating (strong/moderate/weak) based on mean scores.
git clone --depth 1 https://github.com/yogsoth-ai/de-anthropocentric-research-engine /tmp/ahrq-picme-assessment && cp -r /tmp/ahrq-picme-assessment/skills/ahrq-picme-assessment ~/.claude/skills/ahrq-picme-assessmentSKILL.md
# AHRQ PiCMe Assessment
使用 AHRQ PiCMe 框架对研究 gap 进行 6 维度系统评估。
## HARD-GATE
<HARD-GATE>
- 输入必须是 status: "complete" 的 GapRecord
- 6 个维度(P/I/C/M/E + 综合判定)必须全部完成,不得跳过
- 每个维度必须有独立评分(1-5)和文字说明
- overall_verdict 必须为 "strong" | "moderate" | "weak" 之一
</HARD-GATE>
## Pipeline
1. **前置检查**: 验证输入 GapRecord 完整性;确认 domain 字段有效
2. **Population (P)**: 明确该 gap 涉及的目标人群/系统/数据集;评估定义清晰度(1-5)
3. **Intervention (I)**: 明确拟议的干预/方法/解决方案;评估可操作性(1-5)
4. **Comparator (C)**: 明确对比基线(现有 SOTA、无干预、替代方案);评估基线合理性(1-5)
5. **Metrics (M)**: 明确评估指标;评估指标的可测量性和相关性(1-5)
6. **Evidence (E)**: 评估现有证据对该 gap 存在性的支持强度(1-5)
7. **综合判定**: 基于 5 维度均值判定整体质量(strong ≥ 3.5 / moderate 2.5-3.4 / weak < 2.5);生成研究问题草稿
8. **输出**: 返回 PiCMeAssessment 对象
## Output Format
```json
{
"gap_id": "gap_001",
"dimensions": {
"population": { "score": 4, "description": "目标人群描述", "rationale": "..." },
"intervention": { "score": 3, "description": "干预/方法描述", "rationale": "..." },
"comparator": { "score": 3, "description": "对比基线描述", "rationale": "..." },
"metrics": { "score": 4, "description": "评估指标描述", "rationale": "..." },
"evidence": { "score": 4, "description": "证据强度描述", "rationale": "..." }
},
"mean_score": 3.6,
"overall_verdict": "strong",
"research_question_draft": "研究问题草稿(1句)",
"improvement_suggestions": ["建议1", "建议2"]
}
```Experiment-specific - summarize the DARE executor's research design into a clean research_result report, forced to write back into the spec file produced by formated-specs.
Experiment-specific - replaces writing-specs, emits DARE's 4-layer call plan as a clean research_graph schema. Last step forces load formated-result.
loss-1 judge - read a sample's full dialogue and decide whether the user simulator semantically enacted its Policy Card. check-blind.
loss-2 judge - pairwise quality comparison across the n rungs within one topic; decide monotonicity and endpoint separation. check-blind, D1-D5 only.
Strategy: 面对异常的最佳解释推理
Remove components one by one, observe system changes to reveal hidden dependencies and generate ideas from structural gaps.
Map system architecture to ablatable units for ablation studies
Design ablation studies to isolate component contributions in ML systems