Skip to main content
ClaudeWave
Skill1.4k estrellas del repoactualizado today

tooluniverse-meta-analysis

Pool quantitative results from multiple studies into a single combined estimate while assessing consistency across studies. Use this skill after systematically extracting effect sizes from two or more published studies, experimental replicates, or cohort analyses when you need a pooled estimate with confidence intervals and heterogeneity metrics (I², Q, τ²). The skill converts diverse reported formats (odds ratios with confidence intervals, means and standard deviations, correlations, regression coefficients) into the standardized log-transformed effect size and standard error required for meta-analysis, then applies fixed or random-effects models and generates forest plots.

Instalar en Claude Code
Copiar
git clone --depth 1 https://github.com/mims-harvard/ToolUniverse /tmp/tooluniverse-meta-analysis && cp -r /tmp/tooluniverse-meta-analysis/plugin/skills/tooluniverse-meta-analysis ~/.claude/skills/tooluniverse-meta-analysis
Después abre una sesión nueva de Claude Code; el skill carga automáticamente.

SKILL.md

# Meta-Analysis & Evidence Synthesis

Pool quantitative results from **multiple studies** into one estimate, and judge how consistent the studies are. This is the statistical half of a systematic review (the literature-collection half is `tooluniverse-literature-deep-research`).

## When to use this

- You have effect sizes from ≥2 studies/cohorts and want a single pooled estimate + CI.
- Synthesizing a systematic review, replicated experiments, multi-cohort GWAS, or multi-dataset associations.
- Deciding whether studies agree (low heterogeneity) or conflict (high heterogeneity).

Do NOT use it to *find* the studies — use `tooluniverse-literature-deep-research` / the literature tools for that, then bring the extracted numbers here. Before trusting any input study, consider checking it with `Crossref_check_retraction`.

## The workflow

```
1. Extract (effect, SE) per study  ← THE ERROR-PRONE STEP
2. Pick fixed vs random effects
3. Pool: MetaAnalysis_run
4. Read heterogeneity (I², Q, τ²)
5. Forest plot + interpret
```

## Step 1 — Convert each study to (effect_size, se)  ← do this carefully

The pooling step needs an **effect size on an additive scale** and its **standard error**. Ratio measures (OR/RR/HR) must be **log-transformed** first. Most reported numbers give you a CI, not an SE — derive the SE from the CI.

| What the paper reports | effect_size | se |
|---|---|---|
| OR / RR / HR with 95% CI `[L, U]` | `ln(point)` | `(ln(U) − ln(L)) / (2 × 1.96)` |
| OR / RR / HR with a p-value (no CI) | `ln(point)` | `|ln(point)| / z_from_p` (two-sided `z`) |
| GWAS / regression `β` with SE | `β` (as reported) | the reported SE |
| Two groups, means + SDs + n₁,n₂ | Hedges' g (see script) | SE of g (see script) |
| Single proportion `p`, n | `logit(p)=ln(p/(1−p))` | `sqrt(1/(np) + 1/(n(1−p)))` |
| Pearson correlation `r`, n | Fisher `z = atanh(r)` | `1 / sqrt(n − 3)` |

**Critical rules**
- **Log-transform ratio measures.** Pooling raw ORs is wrong; pool `ln(OR)` and exponentiate the pooled result back. The script does this for you.
- **One direction.** Make sure every study's effect points the same way (e.g. "exposure increases risk"); flip sign / invert the ratio for studies coded the opposite way.
- **Same effect measure.** Don't mix OR with HR with mean-difference in one pool.
- `2 × 1.96` assumes a 95% CI; use `2 × 1.645` for 90%, `2 × 2.576` for 99%.

The helper script does these conversions — prefer it over hand math:

```bash
python skills/tooluniverse-meta-analysis/scripts/meta_analysis.py --input studies.csv
# studies.csv columns (use the set that matches your data):
#   name, or, ci_low, ci_high            (ratio + CI)
#   name, beta, se                       (already on log/linear scale)
#   name, mean1, sd1, n1, mean2, sd2, n2 (two-group means -> Hedges' g)
#   name, r, n                           (correlation -> Fisher z)
```

## Step 2 — Fixed vs random effects

| Use **fixed-effects** when | Use **random-effects** when |
|---|---|
| Studies estimate the *same* true effect (e.g. exact replications, one trial split by site) | Studies differ in population/design/dose (the usual real-world case) |
| I² is low (<25%) | I² is moderate–high, or studies are clinically heterogeneous |

When unsure, **report random-effects** (DerSimonian–Laird) as primary — it is the conservative default and widens the CI to reflect between-study variance.

## Step 3 — Pool with MetaAnalysis_run

```bash
tu run MetaAnalysis_run '{"method":"random","studies":[
  {"name":"Smith 2019","effect_size":0.41,"se":0.12},
  {"name":"Lee 2021","effect_size":0.67,"se":0.18},
  {"name":"Garcia 2023","effect_size":0.33,"se":0.10}]}'
```

Returns `pooled_effect`, `pooled_se`, `pooled_ci_lower/upper`, `pooled_z`, `pooled_p_value`, a `heterogeneity` block (`Q`, `Q_df`, `Q_p_value`, `I_squared`, `tau_squared`), and `per_study` weights + CIs.

> **Scale foot-gun — read this.** `MetaAnalysis_run` pools *whatever scale you hand it* and has no idea your inputs were ratios. For an OR/RR/HR you MUST pass the **log-transformed** `effect_size` + `se` from Step 1 (e.g. `ln(1.42)=0.351`, not `1.42`) — feeding raw ratios silently produces a wrong pooled value with no error. And the values it returns — including its prose `interpretation` string — are on that **same log scale**. So: **ignore the tool's `interpretation` field for ratios, and `exp()` the `pooled_effect` and CI bounds back to the OR/RR/HR scale yourself before reporting.** The helper script avoids all of this — it takes raw ORs, tracks the scale, and prints results already back-transformed.

## Step 4 — Interpret heterogeneity (decides the story)

| I² | Heterogeneity | What it means |
|---|---|---|
| 0–25% | Low | Studies largely agree; fixed-effects is defensible |
| 25–50% | Moderate | Prefer random-effects; note the variability |
| 50–75% | Substantial | Random-effects; investigate sources (subgroup / meta-regression) |
| >75% | Considerable | Pooling may be inappropriate — explain *why* studies differ instead |

- `Q_p_value < 0.10` → statistically significant heterogeneity (Q is low-powered, so 0.10 not 0.05).
- `tau_squared` is the between-study variance on the effect scale; `> 0` is what random-effects adds over fixed.
- **Borderline I² (≈25–50%) with a non-significant Q (`Q_p_value ≫ 0.10`), especially with few studies:** fixed and random-effects converge — report **random-effects as primary** and note that fixed-effects agrees. Don't agonize over the model choice when both give essentially the same pooled estimate.

## Step 5 — Forest plot + report

The script prints a text forest plot (per-study effect, CI, weight%, and the pooled diamond). Report, in order:
1. Pooled estimate + 95% CI + p (on the **interpretable** scale — exponentiate ratios back).
2. Number of studies and total N.
3. Heterogeneity: I² + Q p-value + the model you chose and why.
4. Direction/consistency: do all studies point the same way?

> Example: "Across 3 cohorts (N=4,210), the pooled OR
setup-tooluniverseSkill

Install and configure ToolUniverse for any use case — MCP server (chat-based), CLI (command line with 9 subcommands), or Python SDK (Coding API with 3 calling patterns). Covers uv/uvx setup, MCP configuration for 12+ AI clients (Cursor, Claude Desktop, Windsurf, VS Code, Codex, Gemini CLI, Trae, Cline, etc.), full CLI reference (tu list/grep/find/info/run/test/status/build/serve), Coding API quickstart, agentic tools, code executor, API key walkthrough, skill installation, and upgrading. Use when user asks how to set up ToolUniverse, which access mode to use (MCP vs CLI vs SDK), configuring MCP servers, using the CLI, troubleshooting installation, upgrading, or mentions installing ToolUniverse or setting up scientific tools. Also triggers for "how do I use ToolUniverse", "what's the best way to access tools", "command line", "tu command", "coding API", "tu build".

tooluniverse-acmg-variant-classificationSkill

Systematic ACMG/AMP germline variant classification with all 28 criteria (PVS1, PS1-4, PM1-6, PP1-5, BA1, BS1-4, BP1-7) for clinical significance. Produces 5-tier verdict (Pathogenic / Likely Pathogenic / VUS / Likely Benign / Benign) with cited evidence per criterion. Use for variant interpretation, VUS resolution, and pathogenicity assessment. Combines ClinVar, gnomAD, computational predictors, and gene-mechanism context.

tooluniverse-admet-predictionSkill

Comprehensive ADMET (Absorption, Distribution, Metabolism, Excretion, Toxicity) profiling for drug candidates. Integrates ADMET-AI predictions, SwissADME drug-likeness, PubChemTox experimental toxicity, ChEMBL clinical data, Lipinski rule-of-five, and CYP interaction data. Use for drug-likeness assessment, BBB penetration, bioavailability, hepatotoxicity prediction, ADME/PK profiling, or screening compound libraries before lab testing.

tooluniverse-adverse-event-detectionSkill

Detect and analyze adverse drug event signals using FDA FAERS reports, drug labels, and disproportionality statistics (PRR, ROR, IC). Generates quantitative safety signal scores (0-100) with evidence grading. Use for post-market surveillance, pharmacovigilance, drug safety assessment, regulatory submissions, and detecting rare AE signals not visible in clinical trials.

tooluniverse-adverse-outcome-pathwaySkill

Map environmental and industrial chemicals to adverse outcome pathways (AOPs) — molecular initiating event to organ-level toxicity. Uses AOPWiki, GHS classification, IARC carcinogen status, and LD50 data. Use for environmental/industrial chemical risk assessment, regulatory-grade hazard characterization, and AOP stressor mapping. Distinct from drug-safety analysis (use tooluniverse-pharmacovigilance for drugs).

tooluniverse-aging-senescenceSkill

Aging biology, cellular senescence, and longevity research. Covers senescence markers (p16/CDKN2A, SASP, SA-beta-gal), aging hallmarks, senolytic drug discovery (dasatinib+quercetin, fisetin, navitoclax), epigenetic clocks, telomere biology, and longevity GWAS. Use for senescence-pathway analysis, age-related disease genetics, senolytic-target discovery, and centenarian-genetics queries. Distinguishes correlative vs causal evidence (knockout, intervention).

tooluniverse-antibody-engineeringSkill

Therapeutic antibody engineering and optimization, lead-to-clinical-candidate. Covers sequence humanization (germline alignment, framework retention), affinity maturation, developability (aggregation, stability, PTMs), structure modeling (AlphaFold/PDB CDR analysis), immunogenicity prediction, and manufacturing feasibility. Use for biologic-drug optimization, mAb design review, biosimilar engineering, and clinical-precedent comparison.

tooluniverse-binder-discoverySkill

Discover novel small-molecule binders for protein targets using structure-based and ligand-based screening. Covers druggability assessment, known-ligand mining (ChEMBL, BindingDB), similarity expansion, ADMET filtering, and synthesis feasibility. Use for hit identification, virtual screening, target-to-compounds workflows, and lead-finding before commit-to-medchem.