tooluniverse-mendelian-randomization
This Claude Code skill performs Mendelian randomization (MR) analysis to infer whether an exposure causally affects a disease or outcome using genetic variants as instrumental variables, drawing from the IEU OpenGWAS and EpiGraphDB MR-EvE databases. Use it when users ask whether an association is causal versus correlational, whether a biomarker or risk factor truly drives disease, or when they mention genetic causal inference, instrumental variables, or reverse causation, even if they don't explicitly say "Mendelian randomization."
git clone --depth 1 https://github.com/mims-harvard/ToolUniverse /tmp/tooluniverse-mendelian-randomization && cp -r /tmp/tooluniverse-mendelian-randomization/plugin/skills/tooluniverse-mendelian-randomization ~/.claude/skills/tooluniverse-mendelian-randomizationSKILL.md
# Mendelian Randomization (Causal Inference from Genetic Instruments)
**MR estimates the CAUSAL effect of an exposure on an outcome using genetic variants as instrumental variables.** Because alleles are randomized at conception, MR is largely robust to the confounding and reverse causation that bias observational associations. It is *not* a free lunch: the causal claim rests on three assumptions, and violating them (especially horizontal pleiotropy) silently biases the estimate.
**LOOK UP, DON'T GUESS:** never assert a causal MR estimate from memory. Genetic-instrument results are updated as new GWAS are published — always retrieve current evidence with `EpiGraphDB_get_mendelian_randomization`. Do not invent beta/p-values.
**Correlation ≠ causation, and genetic correlation ≠ causation.** A high genetic correlation (`rg`) means two traits share heritability — it does NOT establish a causal direction. Only MR (with valid instruments) speaks to causality. Report them as different kinds of evidence.
## The three instrumental-variable assumptions
| Assumption | Statement | How it fails | Check |
|---|---|---|---|
| **Relevance** | Instrument is robustly associated with the exposure | Weak instruments (low F-stat) → bias toward the confounded observational estimate | MOE score; instruments selected at GWAS significance |
| **Independence** | Instrument shares no common cause with the outcome | Population stratification, assortative mating | Ancestry-matched GWAS; report population |
| **Exclusion restriction** | Instrument affects the outcome ONLY through the exposure | **Horizontal pleiotropy** — the variant influences the outcome via another path | MR-Egger intercept ≈ 0; agreement across methods |
If you cannot speak to these, your causal claim is provisional. Say so.
## When to use
- "Does **[exposure]** causally affect **[outcome/disease]**?" — the core MR question.
- Triangulating an observational/epidemiological association ("BMI correlates with depression — is it causal?").
- Reverse-causation checks (bidirectional MR: does the outcome cause the exposure instead?).
- Prioritising drug targets / risk factors with genetic causal support.
- Distinguishing a causal driver from a shared-etiology bystander (MR vs genetic correlation).
This skill wraps the **IEU OpenGWAS / EpiGraphDB MR-EvE** ("MR Everything-vs-Everything") resource: a large matrix of pre-computed two-sample MR results between GWAS traits. It does **not** run a bespoke two-sample MR from raw summary statistics with your own instrument set — see *Limitations*.
## Anchor tools
| Tool | Purpose |
|---|---|
| `EpiGraphDB_search_opengwas` | Resolve a free-text trait to exact OpenGWAS study IDs + labels (DO THIS FIRST) |
| `EpiGraphDB_get_mendelian_randomization` | Pre-computed MR estimate(s) for an exposure→outcome trait pair (curated pairs; start here) |
| `OpenGWAS_get_mr_instruments` | Custom two-sample MR: fetch the exposure's clumped instruments + their harmonized outcome effects for *any* GWAS pair (needs a free `OPENGWAS_JWT`). Use when the pair isn't in MR-EvE |
| `EpiGraphDB_get_genetic_correlations` | `rg` between a trait and others (shared etiology, NOT causation). **Sparse** — see Step 4 caveat |
| `EpiGraphDB_get_drugs_for_trait` | Drugs targeting genes associated with a risk-factor trait (causal-target follow-up) |
| `gwas_search_associations` | GWAS Catalog associations, to inspect the instruments behind a trait |
## Workflow
### Step 1 — Resolve trait labels (avoid silent misses)
EpiGraphDB matches GWAS trait labels **exactly and case-sensitively**. Always resolve free text first:
```
EpiGraphDB_search_opengwas {"query": "coronary heart disease"}
# → returns ids like 'ieu-a-7' and the exact label 'Coronary heart disease'
```
Use the returned exact label (or a sentence-case form) in the MR call. The MR tool now retries sentence-case variants and returns a `metadata.note` when it falls back or finds nothing — **read that note**; an empty `mr_results` with a note means "labels didn't match", NOT "no causal effect".
### Step 2 — Run MR (exposure → outcome)
```
EpiGraphDB_get_mendelian_randomization {
"exposure_trait": "LDL cholesterol",
"outcome_trait": "Coronary heart disease",
"pval_threshold": 1e-5
}
```
Each row carries `beta` (causal effect estimate), `se`, `pval`, `method`, `moescore`, and the exposure/outcome IDs.
### Step 3 — Interpret (see tables below)
Direction, magnitude, instrument quality, and method agreement.
### Step 4 — Triangulate
1. **Bidirectional MR (primary triangulation)** — swap exposure and outcome to test reverse causation. A causal X→Y with no Y→X strengthens the claim; bidirectional signals suggest shared genetics or feedback. This is the reliable leg — lean on it.
2. **Multiple methods** — prefer pairs where IVW and a pleiotropy-robust method (MR-Egger, weighted median) agree in sign and significance.
3. **Genetic correlation (secondary, often empty)** — `EpiGraphDB_get_genetic_correlations` on the exposure. ⚠️ The `/genetic-cor` graph is **sparse**: it stores only strong edges (|rg| > 0.8), matches **exact, case-sensitive** labels distinct from OpenGWAS search labels, and **ignores** the `pval_threshold` argument. Common traits (e.g. 'Body mass index') return empty — that is a graph gap, **not** "no shared genetics." Read `metadata.note`; if empty, do NOT conclude absence — fall back to bidirectional MR. When it does return, high `rg` + significant MR = causal; high `rg` + null MR = shared etiology without a detectable causal path.
### Step 5 — Actionable follow-up (optional)
`EpiGraphDB_get_drugs_for_trait` surfaces drugs whose target genes drive a causal risk factor — a genetics-anchored repurposing hypothesis.
## Interpretation tables
### Causal effect (`beta`)
| Observation | Meaning |
|---|---|
| `beta > 0`, `pval` significant | Higher exposure causally **increases** the outcome (on the GWAS scale — often log-odds for a binary outcome) |
| `beta < 0`, `pInstall and configure ToolUniverse for any use case — MCP server (chat-based), CLI (command line with 9 subcommands), or Python SDK (Coding API with 3 calling patterns). Covers uv/uvx setup, MCP configuration for 12+ AI clients (Cursor, Claude Desktop, Windsurf, VS Code, Codex, Gemini CLI, Trae, Cline, etc.), full CLI reference (tu list/grep/find/info/run/test/status/build/serve), Coding API quickstart, agentic tools, code executor, API key walkthrough, skill installation, and upgrading. Use when user asks how to set up ToolUniverse, which access mode to use (MCP vs CLI vs SDK), configuring MCP servers, using the CLI, troubleshooting installation, upgrading, or mentions installing ToolUniverse or setting up scientific tools. Also triggers for "how do I use ToolUniverse", "what's the best way to access tools", "command line", "tu command", "coding API", "tu build".
Systematic ACMG/AMP germline variant classification with all 28 criteria (PVS1, PS1-4, PM1-6, PP1-5, BA1, BS1-4, BP1-7) for clinical significance. Produces 5-tier verdict (Pathogenic / Likely Pathogenic / VUS / Likely Benign / Benign) with cited evidence per criterion. Use for variant interpretation, VUS resolution, and pathogenicity assessment. Combines ClinVar, gnomAD, computational predictors, and gene-mechanism context.
Comprehensive ADMET (Absorption, Distribution, Metabolism, Excretion, Toxicity) profiling for drug candidates. Integrates ADMET-AI predictions, SwissADME drug-likeness, PubChemTox experimental toxicity, ChEMBL clinical data, Lipinski rule-of-five, and CYP interaction data. Use for drug-likeness assessment, BBB penetration, bioavailability, hepatotoxicity prediction, ADME/PK profiling, or screening compound libraries before lab testing.
Detect and analyze adverse drug event signals using FDA FAERS reports, drug labels, and disproportionality statistics (PRR, ROR, IC). Generates quantitative safety signal scores (0-100) with evidence grading. Use for post-market surveillance, pharmacovigilance, drug safety assessment, regulatory submissions, and detecting rare AE signals not visible in clinical trials.
Map environmental and industrial chemicals to adverse outcome pathways (AOPs) — molecular initiating event to organ-level toxicity. Uses AOPWiki, GHS classification, IARC carcinogen status, and LD50 data. Use for environmental/industrial chemical risk assessment, regulatory-grade hazard characterization, and AOP stressor mapping. Distinct from drug-safety analysis (use tooluniverse-pharmacovigilance for drugs).
Aging biology, cellular senescence, and longevity research. Covers senescence markers (p16/CDKN2A, SASP, SA-beta-gal), aging hallmarks, senolytic drug discovery (dasatinib+quercetin, fisetin, navitoclax), epigenetic clocks, telomere biology, and longevity GWAS. Use for senescence-pathway analysis, age-related disease genetics, senolytic-target discovery, and centenarian-genetics queries. Distinguishes correlative vs causal evidence (knockout, intervention).
Therapeutic antibody engineering and optimization, lead-to-clinical-candidate. Covers sequence humanization (germline alignment, framework retention), affinity maturation, developability (aggregation, stability, PTMs), structure modeling (AlphaFold/PDB CDR analysis), immunogenicity prediction, and manufacturing feasibility. Use for biologic-drug optimization, mAb design review, biosimilar engineering, and clinical-precedent comparison.
Discover novel small-molecule binders for protein targets using structure-based and ligand-based screening. Covers druggability assessment, known-ligand mining (ChEMBL, BindingDB), similarity expansion, ADMET filtering, and synthesis feasibility. Use for hit identification, virtual screening, target-to-compounds workflows, and lead-finding before commit-to-medchem.