git clone --depth 1 https://github.com/Aperivue/medsci-skills /tmp/fill-protocol && cp -r /tmp/fill-protocol/skills/fill-protocol ~/.claude/skills/fill-protocolSKILL.md
# Fill-Protocol Skill
You are helping a researcher populate an institutional Word form (IRB protocol,
ethics application, grant proposal, etc.) without breaking the original document
formatting. This skill is the formatting counterpart to `write-protocol`: where
`write-protocol` drafts content, `fill-protocol` lays that content into the
institutional template.
## Why This Skill Exists
Recreating institutional forms from scratch with `python-docx` reliably destroys
table layouts, page breaks, and font consistency. The only safe approach is to
**open the existing template** and replace cell/paragraph text in place. This
skill enforces that pattern.
## Core Principles (Do Not Violate)
1. **Open the existing template — never create from scratch.** Use
`Document(template_path)`, not `Document()`.
2. **Convert .doc → .docx via LibreOffice headless** before any editing.
`pandoc -f doc` is not supported; `textutil` corrupts table structure.
3. **Match cells by left-label text**, not row/column coordinates. Templates
evolve and coordinate matching breaks silently.
4. **Apply `cantSplit` to every filled row** so a row never breaks across pages.
5. **For CJK languages, set the `eastAsia` font attribute**, not just
`run.font.name`. Hangul/Kanji/Hanzi will render in fallback fonts otherwise.
6. **Validate** every fill operation: report unmatched labels, count empty cells,
and surface mismatches before saving.
## Dependencies
If the template is already `.docx`, **LibreOffice is not required** — only the
three Python packages below. LibreOffice is needed only when the template is a
legacy `.doc` and must be converted first.
```bash
# Python libraries (always required)
pip install --user docxtpl python-docx pyyaml
# LibreOffice (only for legacy .doc input; ~700 MB on macOS)
brew install --cask libreoffice # macOS
sudo apt-get install -y libreoffice # Debian/Ubuntu
sudo dnf install -y libreoffice # Fedora
sudo pacman -S --needed libreoffice-fresh # Arch
```
### Bundled setup script
The skill ships a `setup.sh` that detects what is missing and installs only
those parts, with a confirmation prompt before each step:
```bash
bash setup.sh check # report what's installed (read-only)
bash setup.sh install # install missing pieces (asks before each)
```
### Auto-install behavior (for Claude as the caller)
When invoking this skill on behalf of a user:
1. **Before calling `doc_to_docx.py`**, run `bash setup.sh check`. If
LibreOffice is missing, **ask the user** before installing — the cask is
~700 MB and proceeding silently is unfriendly.
2. **Skip LibreOffice entirely** if the template is already `.docx`. Only
surface the install prompt when a `.doc` is encountered.
3. **Never** pass `--yes` to `setup.sh install` unless the user has explicitly
authorized unattended installation in this session.
4. If the user declines installation, fall back to asking them to convert
the `.doc` manually (open in Word/LibreOffice/Pages → Save As → .docx) and
then re-run with the converted file.
## Workflow
### Step 1 — Convert legacy .doc to .docx (if needed)
```bash
python scripts/doc_to_docx.py path/to/template.doc path/to/template.docx
```
### Step 2 — Inspect the template structure
```bash
python scripts/inspect_template.py path/to/template.docx
```
This lists every table, every cell (with row/column coordinates and content
preview), and every top-level paragraph. Use this output to identify the labels
you will match against in your YAML content file.
### Step 3 — Author a content YAML
The YAML supports three fill modes. All keys are optional.
```yaml
protections:
korean_font: "맑은 고딕" # CJK font (set to "Noto Sans CJK KR", "SimSun",
# "MS Mincho", etc. for other locales)
cant_split: true # Apply <w:cantSplit/> to every filled row
# Readability options (see "Readability" section below for full semantics)
blank_between_paragraphs: true # default true — Enter between \n\n chunks
blank_around_section_header: true # default true — Enter above/below filled sections
blank_around_all_section_headers: false # default false — opt-in; also touches untouched sections
# Mode 1 — table key/value (left-label cell → right value cell)
table_kv:
"Study Title": "Multi-center prospective validation of ..."
"Principal Investigator": "Last, First (Department)"
"연구 목적": "본 연구는 ..."
# Mode 2 — section replacement (find numbered header, replace until next header)
section_replace:
"1. Background":
"Hepatocellular carcinoma is the third leading cause of ..."
"4. 연구 배경 및 이론적 근거":
"..."
# Mode 3 — single paragraph in-place text replacement
paragraph_replace:
"Title:":
"Title: Multi-center prospective validation of ..."
```
### Readability — three blank-line knobs
All blank paragraphs inserted by these options use a forced single-line height
(`<w:spacing w:line="240" w:before="0" w:after="0"/>`) so the gap is exactly
one body-text line — never inflates the document's apparent line spacing.
| Option | Default | What it does | When to flip |
|---|---|---|---|
| `blank_between_paragraphs` | `true` | Inserts a blank line between every `\n\n`-split chunk inside `section_replace` | Disable only for forms where every line must be packed tight |
| `blank_around_section_header` | `true` | Wraps each header that you `section_replace` with a blank above and a blank below | Disable when the template style already adds visual gaps via `space_before/after` |
| `blank_around_all_section_headers` | `false` | After all fills, scans every numbered header (`\d+\.\s+`) — including ones you didn't replace — and adds blank lines around them | Enable when uniform readability matters more than form fidelity. **Default off because IRB / public-document submissions favor template fidelity over visual consistency** (page count stability, boilerplate untouched, reviewer-expeMedical AI paper optimization for AI search engines (Perplexity, ChatGPT web, Elicit, Consensus, SciSpace) and RAG-based literature tools. Applies when drafting or reviewing titles, abstracts, structured summary boxes (Key Points / Research in Context / Plain-Language Summary), manuscripts for high-impact medical AI journals (Lancet Digital Health, Radiology, Radiology-AI, npj Digital Medicine, Nature Medicine), preprints (medRxiv/arXiv), GitHub README + CITATION.cff + Zenodo archives, and Hugging Face model/dataset cards. Integrates TRIPOD+AI, CLAIM 2024, STARD-AI, TRIPOD-LLM, DECIDE-AI reporting requirements with generative engine optimization (GEO) principles. Produces a visible pass/fail checklist.
>
Statistical analysis for medical research papers. Generates reproducible Python/R code with publication-ready tables and figures. Supports diagnostic accuracy, inter-rater agreement, meta-analysis, survival analysis, survey data, group comparisons, regression, propensity score, and repeated measures.
PubMed author profile analysis. Author name → PubMed fetch → study type classification → visualization → strategy report.
Generate N analysis scripts from a single methodology template × multiple exposure/outcome combinations. The "80-person team" pattern — same validated method, swap variables only. Produces batch R/Python code + summary matrix.
>
Check manuscript compliance with medical research reporting guidelines. Supports 32 guidelines including STROBE, CONSORT, STARD, STARD-AI, TRIPOD, TRIPOD+AI, ARRIVE, PRISMA, PRISMA-DTA, PRISMA-P, CARE, SPIRIT, CLAIM, MI-CLEAR-LLM, SQUIRE 2.0, CLEAR, MOOSE, GRRAS, SWiM, AMSTAR 2, and risk of bias tools (QUADAS-2, QUADAS-C, RoB 2, ROBINS-I, ROBINS-E, ROBIS, ROB-ME, PROBAST, PROBAST+AI, NOS, COSMIN, RoB NMA). Generates item-by-item assessment with PRESENT/MISSING/PARTIAL status.