Skill223 repo starsupdated yesterday

batch-cohort

The batch-cohort skill generates multiple analysis scripts from a single validated methodology template by systematically swapping exposure and outcome variables across researcher-specified combinations. Use this when you have a proven analytical approach and need to test it consistently across many variable pairs, such as exploring how depression, obesity, and smoking each predict diabetes, hypertension, and cardiovascular disease using identical statistical methods.

View source Repository: medsci-skills

Install in Claude Code

Copy

git clone --depth 1 https://github.com/Aperivue/medsci-skills /tmp/batch-cohort && cp -r /tmp/batch-cohort/skills/batch-cohort ~/.claude/skills/batch-cohort

Then start a new Claude Code session; the skill loads automatically.

Definition

SKILL.md

# Batch Cohort Analysis Skill

You are assisting a medical researcher in generating multiple analysis scripts from a single
validated methodology template, each differing only in the exposure/outcome variable combination.
This replicates the "80-person research team" pattern: one PI designs the methodology, and
many researchers execute the same approach with different variable swaps.

## When to Use

- Researcher has a **validated analysis template** (e.g., from /replicate-study or /cross-national)
- Wants to explore **multiple exposure → outcome combinations** on the same database
- Goal: systematic variable-swap code generation + batch execution + result matrix

## Inputs

1. **Database path(s)**: CSV/SAS data files (KNHANES, NHANES, NHIS, or any cleaned cohort)
2. **Methodology template**: One of:
   - Path to a validated R/Python analysis script (from /replicate-study or /cross-national)
   - A paper type template name: `nhis_cohort`, `cross_national`, `survey_weighted`
   - A source paper to extract methodology from (falls back to /replicate-study Phase 1)
3. **Combination spec**: A list of exposure/outcome pairs, provided as:
   - Inline list: `exposures: [depression, obesity, smoking]; outcomes: [diabetes, hypertension, CVD]`
   - CSV file with columns: `exposure`, `outcome`, (optional) `subgroup_vars`
   - `"all"` keyword: generates all pairwise combinations from the lists

### Optional Inputs

- **Covariate set**: Fixed covariate list for all analyses (default: use template's set)
- **Subgroup variables**: Variables to stratify by (default: sex, age group)
- **Output format**: `code_only` (just scripts) | `execute` (run + collect results) | `full` (code + results + summary)
- **Cross-national mode**: If TRUE, generates paired scripts for both countries per combination

## Workflow

### Phase 1: Template Validation

1. Read the methodology template (R script or paper type reference).
2. Identify the **slot variables** — parts that change per combination:
   - `EXPOSURE_VAR`: raw variable name in the database
   - `EXPOSURE_LABEL`: human-readable label for tables/figures
   - `EXPOSURE_CODING`: how to derive binary/categorical exposure
   - `OUTCOME_VAR`: raw variable name
   - `OUTCOME_LABEL`: human-readable label
   - `OUTCOME_CODING`: how to derive binary outcome
3. Verify the template runs successfully on at least one combination before batch generation.
4. Output: template summary with identified slots → user approval.

### Phase 2: Variable Specification

For each exposure and outcome in the combination spec:

1. **Look up** the variable in the database:
   - KNHANES: check variable name exists in the CSV header
   - NHANES: check which table contains the variable (use codebook.csv if available)
   - NHIS: check claims code or variable name
2. **Define coding**:
   - Binary: threshold or category mapping (e.g., `HE_glu >= 126 → diabetes = 1`)
   - Categorical: level definitions (e.g., `smoking: current/former/never`)
3. **Check covariate overlap**: If the exposure IS one of the standard covariates, remove it from the adjustment set for that analysis (no self-adjustment).
4. Output: **combination matrix** with all variable specifications.

```
| # | Exposure | Exposure Coding | Outcome | Outcome Coding | Covariates (adjusted) | Notes |
|---|----------|-----------------|---------|----------------|----------------------|-------|
| 1 | Depression (PHQ≥10) | BP_PHQ sum ≥10 | Diabetes | HE_glu≥126|HbA1c≥6.5|DE1_dg=1 | age,sex,edu,income,smoking,alcohol,obesity,CVD | — |
| 2 | Obesity (BMI≥25) | HE_obe ≥4 | Diabetes | same | age,sex,edu,income,smoking,alcohol,depression,CVD | obesity removed from covariates |
| ... | | | | | | |
```

### Phase 3: Batch Code Generation

For each combination in the matrix:

1. **Clone** the template script.
2. **Replace** slot variables with the combination-specific values.
3. **Adjust covariates**: Remove exposure variable from covariate list if present.
4. **Set output paths**: Each combination gets its own results subdirectory.
5. **Generate a master runner script** (`run_all.R` or `run_all.sh`) that:
   - Executes all N scripts sequentially (or in parallel via `future`/`parallel`)
   - Captures errors per script without stopping the batch
   - Logs execution time per analysis

### Phase 4: Batch Execution (if `execute` or `full` mode)

1. Run the master script.
2. Collect results from each combination's output directory.
3. Handle failures gracefully:
   - Log which combinations failed and why
   - Common failures: convergence issues, too few events, empty subgroups
   - Suggest fixes for failed combinations

### Phase 5: Summary Matrix

Aggregate all results into a single summary:

**Main Results Matrix** (`summary_matrix.csv`):

| Exposure | Outcome | N | Events | Model 1 OR (95% CI) | Model 2 OR (95% CI) | Model 3 OR (95% CI) | p-value | Significant |
|----------|---------|---|--------|---------------------|---------------------|---------------------|---------|-------------|
| Depression | Diabetes | 5,811 | 487 | 2.14 (1.52–3.01) | 1.89 (1.33–2.69) | 1.36 (0.91–2.05) | 0.137 | No |
| Obesity | Diabetes | 5,811 | 487 | 3.45 (2.71–4.39) | 3.38 (2.65–4.32) | 3.12 (2.42–4.02) | <0.001 | Yes |
| ... | | | | | | | | |

**Subgroup Summary** (`subgroup_matrix.csv`): Same format, stratified by subgroup variables.

**Heatmap** (optional): Visual matrix of effect sizes × significance, exposure on Y-axis, outcome on X-axis.

## Output Files

```
{working_dir}/batch_{timestamp}/
├── README.md                    — Batch run summary (N combinations, template used, date)
├── combination_matrix.csv       — All exposure/outcome specs with coding
├── template/
│   └── base_template.R          — The validated template (frozen copy)
├── scripts/
│   ├── 01_depression_diabetes.R
│   ├── 02_obesity_diabetes.R
│   ├── ...
│   └── run_all.R                — Master execution script
├── results/
│   ├── 01_depression_diabetes/
│   │   ├── table1.csv
│   │   ├── main_r

More from this repository

skillsSkill

academic-aioSkill

Medical AI paper optimization for AI search engines (Perplexity, ChatGPT web, Elicit, Consensus, SciSpace) and RAG-based literature tools. Applies when drafting or reviewing titles, abstracts, structured summary boxes (Key Points / Research in Context / Plain-Language Summary), manuscripts for high-impact medical AI journals (Lancet Digital Health, Radiology, Radiology-AI, npj Digital Medicine, Nature Medicine), preprints (medRxiv/arXiv), GitHub README + CITATION.cff + Zenodo archives, and Hugging Face model/dataset cards. Integrates TRIPOD+AI, CLAIM 2024, STARD-AI, TRIPOD-LLM, DECIDE-AI reporting requirements with generative engine optimization (GEO) principles. Produces a visible pass/fail checklist.

add-journalSkill

analyze-statsSkill

Statistical analysis for medical research papers. Generates reproducible Python/R code with publication-ready tables and figures. Supports diagnostic accuracy, inter-rater agreement, meta-analysis, survival analysis, survey data, group comparisons, regression, propensity score, and repeated measures.

author-strategySkill

PubMed author profile analysis. Author name → PubMed fetch → study-type classification → visualization → strategy report → optional trajectory-archetype classification.

calc-sample-sizeSkill

check-reportingSkill

Check manuscript compliance with medical research reporting guidelines. Supports 36 guidelines including STROBE, CONSORT, CONSORT-AI, STARD, STARD-AI, TRIPOD, TRIPOD+AI, TRIPOD-LLM, ARRIVE, PRISMA, PRISMA-DTA, PRISMA-P, CARE, SPIRIT, SPIRIT-AI, CLAIM, DECIDE-AI, MI-CLEAR-LLM, SQUIRE 2.0, CLEAR, MOOSE, GRRAS, SWiM, AMSTAR 2, and risk of bias tools (QUADAS-2, QUADAS-C, RoB 2, ROBINS-I, ROBINS-E, ROBIS, ROB-ME, PROBAST, PROBAST+AI, NOS, COSMIN, RoB NMA). Generates item-by-item assessment with PRESENT/MISSING/PARTIAL status.

clean-dataSkill

Interactive data profiling and cleaning assistant for medical research. Three-stage workflow (profile, flag, code-generate) with user approval gates at each step. Handles missing values, outliers, duplicates, and type mismatches in CSV/Excel clinical data. Does NOT auto-clean — all decisions require researcher confirmation.