Skip to main content
ClaudeWave
Skill146 estrellas del repoactualizado yesterday

analyze-stats

Statistical analysis for medical research papers. Generates reproducible Python/R code with publication-ready tables and figures. Supports diagnostic accuracy, inter-rater agreement, meta-analysis, survival analysis, survey data, group comparisons, regression, propensity score, and repeated measures.

Instalar en Claude Code
Copiar
git clone --depth 1 https://github.com/Aperivue/medsci-skills /tmp/analyze-stats && cp -r /tmp/analyze-stats/skills/analyze-stats ~/.claude/skills/analyze-stats
Después abre una sesión nueva de Claude Code; el skill carga automáticamente.

SKILL.md

# Statistical Analysis Skill

You are assisting a medical researcher with statistical analyses for medical research papers.
Generate reproducible code (Python preferred, R when necessary) that produces publication-ready
tables and figures following journal standards for medical imaging research.

## Data Privacy Check

Before reading any data file, check whether it might contain Protected Health Information (PHI):

1. If `*_deidentified.*` files exist in the working directory, use those preferentially.
2. If only raw CSV/Excel files exist (no `*_deidentified.*` counterpart), warn the user (ask in the user's preferred language):
   > "Does this data contain patient identifiers (names, national ID / RRN, contact details, etc.)?
   > If so, please de-identify it first with the `/deidentify` skill."
3. If the user confirms the data is already de-identified or contains no PHI, proceed.
4. **NEVER** display raw PHI values (names, phone numbers, RRN) in your output. If you
   encounter them while reading data, warn the user and suggest running `/deidentify`.

## Reference Files

- **Templates**: `${CLAUDE_SKILL_DIR}/references/templates/` -- reusable analysis scripts
- **Analysis guides**: `${CLAUDE_SKILL_DIR}/references/analysis_guides/` -- on-demand methodology references
- **Table standards**: `${CLAUDE_SKILL_DIR}/references/table-standards/` -- journal-specific table formatting
  - `table-standards.md` -- universal rules, AMA rules, footnote system, mistakes checklist
  - `journal-profiles/` -- YAML profiles per journal (radiology, jama, nejm, lancet, eur_rad, ajr)
  - `table-types/` -- templates per table type (Table 1, diagnostic accuracy, regression, meta-analysis, model comparison)
  - `tool-comparison.md` -- R/Python tool comparison and recommended pipelines
- **Figure style**: `${CLAUDE_SKILL_DIR}/references/style/figure_style.mplstyle`
- **Project data**: See CLAUDE.md for data locations under `2_Data/`

Read relevant templates before generating analysis code. For complex analysis types
(regression, propensity score, repeated measures), also load the corresponding guide
from `analysis_guides/` to ensure correct methodology and reporting.

## Workflow

### Phase 1: Data Assessment

1. **Read the data file** (CSV, Excel, TSV, or other tabular format).
2. **Report to the user**:
   - Shape (rows x columns)
   - Column names and inferred types (continuous, categorical, ordinal, binary, datetime)
   - Missing values per column (count and percentage)
   - First 5 rows preview
   - Unique value counts for categorical columns
3. **Identify the analysis unit**: patient, exam, lesion, image, rater, study, etc.

### Phase 2: Analysis Plan

Based on the data structure and research question, propose an analysis plan:

1. **Auto-detect analysis type** from the table below, or accept user specification.
2. **List specific tests** to be performed.
3. **Identify primary and secondary endpoints**.
4. **State assumptions** that will be checked (normality, homogeneity, independence).
5. **Note any data cleaning** needed (recoding, outlier handling, missing data strategy).
6. **Anchor the estimand to the research question.** If interaction/synergy/effect-modification is the question, the primary estimand is the **interaction parameter itself** (a likelihood-ratio test of the interaction term, or the interaction OR/HR on a single consistent scale) — not a main-effect OR whose CI is then read as "no synergy." If the claim is equivalence or non-inferiority, declare the margin up front (a TOST procedure, or the CI compared against a pre-stated MCID); a non-significant difference is not equivalence without a margin.

Present the plan and **wait for user approval** before executing.

| Type | When to use | Python packages | R packages | Primary output |
|------|-------------|-----------------|------------|----------------|
| Table 1 (Demographics) | Baseline characteristics | pandas, scipy | tableone | Demographics table |
| Diagnostic Accuracy | Sensitivity/specificity/AUC | sklearn, scipy | pROC | ROC curve, performance table |
| Inter-rater Agreement | Multiple raters rating same items | krippendorff, pingouin | irr, psych | ICC/Kappa table |
| Meta-analysis | Pooling effect sizes across studies | -- | meta, metafor | Forest + funnel plots |
| DTA Meta-analysis | Pooling diagnostic accuracy across studies | -- | meta, metafor, mada | SROC + paired forest plots |
| Survey/Likert | Ordinal rating scales | pingouin, scipy | psych | Descriptive + reliability |
| Survival | Time-to-event outcomes | lifelines | survival | KM curves, Cox table |
| Group Comparison | Comparing 2+ groups | scipy, pingouin | -- | Test results + effect sizes |
| Correlation | Association between variables | scipy, pingouin | -- | Scatter + correlation matrix |
| Logistic Regression | Binary outcome + predictors | statsmodels, sklearn | -- | OR table, C-statistic, forest plot |
| Linear Regression | Continuous outcome + predictors | statsmodels | -- | Coefficient table, R², diagnostic plots |
| Propensity Score | Observational treatment comparison | sklearn, statsmodels | MatchIt, WeightIt, cobalt | Balance table, Love plot, weighted analysis |
| Survey-Weighted | Complex survey data (KNHANES, NHANES, KCHS) | statsmodels | survey, tableone, gWQS | Weighted Table 1, wOR table, subgroup results |
| Repeated Measures | Longitudinal / multi-timepoint data | pingouin, statsmodels | lme4, nlme, geepack | Spaghetti plot, LMM/GEE/RM ANOVA results |

For **Logistic Regression**, **Linear Regression**, **Propensity Score**, **Survey-Weighted**, and **Repeated Measures**:
load the corresponding guide from `${CLAUDE_SKILL_DIR}/references/analysis_guides/` before generating code.
For **Survey-Weighted** analysis, also load `survey_weighted.md`. For NHIS claims-based studies, load `nhis_icd10_mapping.md`.
For test selection guidance, load `${CLAUDE_SKILL_DIR}/references/analysis_guides/test_selection.md`.

### Phase 3: Execute

Generate and run a P
skillsSkill
academic-aioSkill

Medical AI paper optimization for AI search engines (Perplexity, ChatGPT web, Elicit, Consensus, SciSpace) and RAG-based literature tools. Applies when drafting or reviewing titles, abstracts, structured summary boxes (Key Points / Research in Context / Plain-Language Summary), manuscripts for high-impact medical AI journals (Lancet Digital Health, Radiology, Radiology-AI, npj Digital Medicine, Nature Medicine), preprints (medRxiv/arXiv), GitHub README + CITATION.cff + Zenodo archives, and Hugging Face model/dataset cards. Integrates TRIPOD+AI, CLAIM 2024, STARD-AI, TRIPOD-LLM, DECIDE-AI reporting requirements with generative engine optimization (GEO) principles. Produces a visible pass/fail checklist.

add-journalSkill

>

author-strategySkill

PubMed author profile analysis. Author name → PubMed fetch → study type classification → visualization → strategy report.

batch-cohortSkill

Generate N analysis scripts from a single methodology template × multiple exposure/outcome combinations. The "80-person team" pattern — same validated method, swap variables only. Produces batch R/Python code + summary matrix.

calc-sample-sizeSkill

>

check-reportingSkill

Check manuscript compliance with medical research reporting guidelines. Supports 32 guidelines including STROBE, CONSORT, STARD, STARD-AI, TRIPOD, TRIPOD+AI, ARRIVE, PRISMA, PRISMA-DTA, PRISMA-P, CARE, SPIRIT, CLAIM, MI-CLEAR-LLM, SQUIRE 2.0, CLEAR, MOOSE, GRRAS, SWiM, AMSTAR 2, and risk of bias tools (QUADAS-2, QUADAS-C, RoB 2, ROBINS-I, ROBINS-E, ROBIS, ROB-ME, PROBAST, PROBAST+AI, NOS, COSMIN, RoB NMA). Generates item-by-item assessment with PRESENT/MISSING/PARTIAL status.

clean-dataSkill

Interactive data profiling and cleaning assistant for medical research. Three-stage workflow (profile, flag, code-generate) with user approval gates at each step. Handles missing values, outliers, duplicates, and type mismatches in CSV/Excel clinical data. Does NOT auto-clean — all decisions require researcher confirmation.