git clone --depth 1 https://github.com/Aperivue/medsci-skills /tmp/design-study && cp -r /tmp/design-study/skills/design-study ~/.claude/skills/design-studySKILL.md
# Design-Study Skill ## Purpose This skill pressure-tests whether a study is answerable, interpretable, and defensible before large amounts of drafting or analysis work accumulate. Use it when: - a study question is known but the analysis plan is still fluid - the user wants a methods sanity check - a manuscript feels vulnerable to reviewer criticism - a peer review requires explicit methodological diagnosis --- ## Communication Rules - Communicate with the user in their preferred language. - Use English for statistical, radiologic, and reporting-guideline terminology. - Be direct about validity risks, but always propose the smallest feasible fix first. --- ## Core Review Questions Always inspect these dimensions: 1. What is the exact research question? 2. What is the analysis unit: patient, lesion, exam, study, phase, report? 3. What is the index date or decision point? 4. How are inclusion and exclusion criteria applied? 5. Is there any information leakage? 6. What is the reference standard or endpoint definition? 7. What comparator is clinically meaningful? 8. What validation strategy is used? 9. What uncertainty reporting is required? 10. Which reporting guideline best fits? 11. Are exposure/outcome/covariate **definitions literature-grounded**, or invented ad-hoc from the data dictionary? If ad-hoc, defer to `/define-variables` before drafting Methods. --- ## Standard Output ```text ## Study Design Review Question: ... Study type: ... Analysis unit: ... Index date / prediction timepoint: ... ### Strengths - ... ### Major validity risks 1. ... 2. ... ### Minimal fixes - ... ### Reporting fit - Recommended guideline: ... ### Decision - Ready for analysis / Needs redesign / Drafting can proceed with limitations ``` --- ## Workflow ### Phase 1: Reconstruct the study Extract from protocol, draft, slides, tables, or notes: - clinical problem - intended use case - population - inputs - outputs - outcome definition - timing of variable availability **Gate:** Present the reconstructed study summary (question, analysis unit, intended use) to the user. Confirm before proceeding — if the reconstruction is wrong, the entire validity review will be misdirected. ### Phase 2: Check structural validity #### A. Analysis unit Look for mismatches such as: - patient-level claim from lesion-level analysis - exam-level split with patient overlap - phase-level samples treated as independent #### B. Leakage Look for: - postoperative features used for preoperative prediction - normalization or thresholding performed before data split - repeated exams across train/test - reader annotations derived from outcome information - **input-text contamination for NLP/LLM extraction tasks**: if the model input includes report sections such as clinical history, indication, impression, prior diagnosis, or referral text, confirm that those fields do not literally name or strongly imply the target label. If the target is already present in the supplied text, the task is information retrieval under label leakage, not phenotype inference; redesign the input mask, report a sensitivity analysis excluding leaky fields, or reframe the claim. - **construct dependence** (a predictor that is a definitional component of the outcome). Two cases: (i) *mathematical definition* — an input that computes the outcome (when the outcome is HOMA-IR = f(fasting insulin, fasting glucose), those two inputs are not independent predictors); (ii) *near-tautological composite* — a ratio or score built from the outcome's defining components, which shows an inflated, near-circular association. Test: "could this predictor be derived, in whole or part, from the outcome's definition or the same measurement?" If yes, exclude it, or retain it only as a labeled calibration probe rather than a reported discovery. #### F. Time origin & survivorship (incident / transition models) For any time-to-event or incident/transition design, check before drafting: - **Time origin per model.** Each incident model starts its at-risk clock at the correct origin. Watch for **immortal-time bias** (a span in which the event cannot occur, misattributed to one group) and **left-truncation / delayed entry** (subjects entering the risk set after the origin). - **Mediator-ascertainment-window survivorship.** A "progressor" / transition label that is conditional on *surviving to* a later ascertainment (a second scan, a follow-up visit) is survivorship-biased; plan a landmark time or an explicit intermediate-state (multistate / illness-death) model. - **Primary-analysis-set selection.** If the primary will not be the full cohort (e.g., complete-case while a large fraction is missing), pre-specify the selection justification and a MAR rationale; do not let the complete-case model become primary because it is the significant one (an outcome-dependent choice). - A design that cannot yet answer these should say so honestly — but note that at review time a Methods/Limitations admission that the issue was *"not formally assessed"* is escalated to a MAJOR by the survival probe (S1), not waved through as a limitation. #### C. Reference standard Check: - who established ground truth - when it was established - whether blinding was possible - whether only a subset had gold standard verification - **Construct ↔ nominal-definition match.** Does the exposure/finding *construct* stay inside its stated definition, or does it quietly exceed it? An "incidentaloma" defined as an *indeterminate* finding must not include frank malignancy reads; a label that overshoots its definition inflates the apparent cohort and breaks the κ. For each construct, restate the nominal definition and confirm every included case satisfies it. - **Per-flag reference-standard concordance.** When the index finding is flagged against a reference standard, report the concordance *per flag category* (not just overall). A construct where a large fraction of flags do not mat
Medical AI paper optimization for AI search engines (Perplexity, ChatGPT web, Elicit, Consensus, SciSpace) and RAG-based literature tools. Applies when drafting or reviewing titles, abstracts, structured summary boxes (Key Points / Research in Context / Plain-Language Summary), manuscripts for high-impact medical AI journals (Lancet Digital Health, Radiology, Radiology-AI, npj Digital Medicine, Nature Medicine), preprints (medRxiv/arXiv), GitHub README + CITATION.cff + Zenodo archives, and Hugging Face model/dataset cards. Integrates TRIPOD+AI, CLAIM 2024, STARD-AI, TRIPOD-LLM, DECIDE-AI reporting requirements with generative engine optimization (GEO) principles. Produces a visible pass/fail checklist.
>
Statistical analysis for medical research papers. Generates reproducible Python/R code with publication-ready tables and figures. Supports diagnostic accuracy, inter-rater agreement, meta-analysis, survival analysis, survey data, group comparisons, regression, propensity score, and repeated measures.
PubMed author profile analysis. Author name → PubMed fetch → study type classification → visualization → strategy report.
Generate N analysis scripts from a single methodology template × multiple exposure/outcome combinations. The "80-person team" pattern — same validated method, swap variables only. Produces batch R/Python code + summary matrix.
>
Check manuscript compliance with medical research reporting guidelines. Supports 32 guidelines including STROBE, CONSORT, STARD, STARD-AI, TRIPOD, TRIPOD+AI, ARRIVE, PRISMA, PRISMA-DTA, PRISMA-P, CARE, SPIRIT, CLAIM, MI-CLEAR-LLM, SQUIRE 2.0, CLEAR, MOOSE, GRRAS, SWiM, AMSTAR 2, and risk of bias tools (QUADAS-2, QUADAS-C, RoB 2, ROBINS-I, ROBINS-E, ROBIS, ROB-ME, PROBAST, PROBAST+AI, NOS, COSMIN, RoB NMA). Generates item-by-item assessment with PRESENT/MISSING/PARTIAL status.