Skill2.9k repo starsupdated 7d ago

bio-causal-genomics-mendelian-randomization

This Claude Code skill performs Mendelian randomization analysis to estimate causal effects between exposures and outcomes using genetic variants as instrumental variables. It implements multiple methods including inverse-variance weighting, MR-Egger regression, weighted median, and MR-PRESSO to produce robust causal inference from GWAS summary statistics. Use this skill when investigating whether a specific exposure causally affects an outcome by leveraging genetic instruments that satisfy instrumental variable assumptions.

View source Repository: OpenClaw-Medical-Skills

Install in Claude Code

Copy

git clone --depth 1 https://github.com/FreedomIntelligence/OpenClaw-Medical-Skills /tmp/bio-causal-genomics-mendelian-randomization && cp -r /tmp/bio-causal-genomics-mendelian-randomization/skills/bio-causal-genomics-mendelian-randomization ~/.claude/skills/bio-causal-genomics-mendelian-randomization

Then start a new Claude Code session; the skill loads automatically.

Definition

SKILL.md

## Version Compatibility

Reference examples tested with: TwoSampleMR 0.5+, MendelianRandomization 0.9+

Before using code patterns, verify installed versions match. If versions differ:
- R: `packageVersion("<pkg>")` then `?function_name` to verify parameters

If code throws ImportError, AttributeError, or TypeError, introspect the installed
package and adapt the example to match the actual API rather than retrying.

# Mendelian Randomization

**"Test whether my exposure causally affects this outcome using GWAS data"** → Use genetic variants as instrumental variables to estimate causal effects from GWAS summary statistics, applying IVW, MR-Egger, and weighted median methods for robust inference.
- R: `TwoSampleMR::mr()` for multi-method causal estimation
- R: `MendelianRandomization::mr_ivw()` for individual methods

## Core Concepts

Mendelian randomization (MR) uses genetic variants as instrumental variables (IVs) to estimate
causal effects of exposures on outcomes. Valid instruments must satisfy three assumptions:

1. **Relevance** - The variant is associated with the exposure (F-statistic > 10)
2. **Independence** - The variant is not associated with confounders
3. **Exclusion restriction** - The variant affects the outcome only through the exposure

## TwoSampleMR Workflow

**Goal:** Estimate the causal effect of an exposure on an outcome using GWAS summary statistics and genetic instruments.

**Approach:** Extract instruments for the exposure, extract matching outcome data, harmonize allele directions, and run multiple MR methods (IVW, Egger, weighted median, weighted mode).

**"Test if an exposure causally affects an outcome"** -> Use genetic variants as instrumental variables to estimate causal effects from GWAS data.
- R: `TwoSampleMR` (extract_instruments + harmonise_data + mr)
- R: `MendelianRandomization` (mr_input + mr_ivw/mr_egger)

```r
library(TwoSampleMR)

# --- Step 1: Extract instruments for the exposure ---
# From OpenGWAS (requires authentication -- see below)
exposure_dat <- extract_instruments(outcomes = 'ieu-a-2', p1 = 5e-08, clump = TRUE)

# From local GWAS summary statistics
exposure_dat <- read_exposure_data(
  filename = 'exposure_gwas.txt',
  sep = '\t',
  snp_col = 'SNP', beta_col = 'BETA', se_col = 'SE',
  effect_allele_col = 'A1', other_allele_col = 'A2',
  pval_col = 'P', eaf_col = 'EAF'
)

# Clump instruments to remove LD (r2 < 0.001, 10 Mb window)
# r2 < 0.001: Standard threshold to ensure instrument independence
# 10000 kb window: Wide enough to capture long-range LD
exposure_dat <- clump_data(exposure_dat, clump_r2 = 0.001, clump_kb = 10000)

# --- Step 2: Extract outcome data ---
outcome_dat <- extract_outcome_data(snps = exposure_dat$SNP, outcomes = 'ieu-a-7')

# From local summary statistics
outcome_dat <- read_outcome_data(
  filename = 'outcome_gwas.txt',
  sep = '\t',
  snp_col = 'SNP', beta_col = 'BETA', se_col = 'SE',
  effect_allele_col = 'A1', other_allele_col = 'A2',
  pval_col = 'P', eaf_col = 'EAF'
)

# --- Step 3: Harmonize ---
# Ensures effect alleles are aligned between exposure and outcome
dat <- harmonise_data(exposure_dat, outcome_dat, action = 2)

# action = 1: Assume all alleles on forward strand
# action = 2: Try to infer forward strand (default, recommended)
# action = 3: Correct strand for palindromic SNPs using allele frequencies

# --- Step 4: Perform MR ---
results <- mr(dat)

# Run all standard methods
results <- mr(dat, method_list = c(
  'mr_ivw',              # Inverse variance weighted (primary)
  'mr_egger_regression', # MR-Egger (detects pleiotropy)
  'mr_weighted_median',  # Weighted median (robust to 50% invalid)
  'mr_weighted_mode'     # Weighted mode (robust to outliers)
))
```

## OpenGWAS Authentication

OpenGWAS (ieugwasr) requires authentication. The auth system has changed multiple
times. Refer to the ieugwasr README for current instructions:
https://github.com/MRCIEU/ieugwasr

For reproducibility, prefer downloading GWAS summary statistics directly and using
`read_exposure_data()` / `read_outcome_data()` with local files.

## Interpreting Results

**Goal:** Evaluate MR evidence through method comparison, heterogeneity testing, and sensitivity analyses.

**Approach:** Compare estimates across methods for consistency, test for heterogeneity (Cochran's Q), pleiotropy (Egger intercept), and single-SNP influence (leave-one-out).

```r
# Method comparison table
results

# Key columns: method, nsnp, b (causal estimate), se, pval
# IVW is the primary method; others are sensitivity analyses
# Consistent direction/magnitude across methods strengthens evidence

# --- Heterogeneity (Cochran's Q) ---
het <- mr_heterogeneity(dat)
# Significant Q-statistic suggests pleiotropy or invalid instruments
# Q p-value < 0.05: Evidence of heterogeneity

# --- Pleiotropy (Egger intercept) ---
pleiotropy <- mr_pleiotropy_test(dat)
# Significant intercept (p < 0.05): Evidence of directional pleiotropy
# Non-significant intercept: No evidence (but low power with few SNPs)

# --- Leave-one-out ---
loo <- mr_leaveoneout(dat)
# Check if any single SNP drives the result
# Causal estimate should remain stable when each SNP is removed

# --- Single SNP analysis ---
single <- mr_singlesnp(dat)
```

## Instrument Strength

**Goal:** Assess whether genetic instruments are strong enough for valid MR inference.

**Approach:** Compute per-instrument F-statistics from exposure effect sizes and standard errors, removing weak instruments (F < 10).

```r
# F-statistic for each instrument
# F = (beta_exposure / se_exposure)^2
# F > 10: Sufficient instrument strength (conventional threshold)
# F < 10: Weak instrument bias (toward confounded observational estimate)
dat$f_statistic <- (dat$beta.exposure / dat$se.exposure)^2

# Mean F-statistic across all instruments
mean_f <- mean(dat$f_statistic)
cat('Mean F-statistic:', mean_f, '\n')
cat('Instruments with F < 10:', sum(dat$f_statistic < 10), '\n')

# Remove weak instruments
dat <- dat[dat$f_statistic

More from this repository

aav-vector-design-agentSkill

adaptyvSkill

Cloud laboratory platform for automated protein testing and validation. Use when designing proteins and needing experimental validation including binding assays, expression testing, thermostability measurements, enzyme activity assays, or protein sequence optimization. Also use for submitting experiments via API, tracking experiment status, downloading results, optimizing protein sequences for better expression using computational tools (NetSolP, SoluProt, SolubleMPNN, ESM), or managing protein design workflows with wet-lab validation.

adhd-daily-plannerSkill

Time-blind friendly planning, executive function support, and daily structure for ADHD brains. Specializes in realistic time estimation, dopamine-aware task design, and building systems that

aeonSkill

This skill should be used for time series machine learning tasks including classification, regression, clustering, forecasting, anomaly detection, segmentation, and similarity search. Use when working with temporal data, sequential patterns, or time-indexed observations requiring specialized algorithms beyond standard ML approaches. Particularly suited for univariate and multivariate time series analysis with scikit-learn compatible APIs.

agent-browserSkill

Browse the web for any task — research topics, read articles, interact with web apps, fill forms, take screenshots, extract data, and test web pages. Use whenever a browser would be useful, not just when the user explicitly asks.

agentd-drug-discoverySkill

ai-analyzerSkill

AI驱动的综合健康分析系统，整合多维度健康数据、识别异常模式、预测健康风险、提供个性化建议。支持智能问答和AI健康报告生成。

alphafold-databaseSkill

Access AlphaFold's 200M+ AI-predicted protein structures. Retrieve structures by UniProt ID, download PDB/mmCIF files, analyze confidence metrics (pLDDT, PAE), for drug discovery and structural biology.