Skill2.9k repo starsupdated 7d ago

bio-causal-genomics-colocalization-analysis

This skill performs Bayesian colocalization analysis to test whether a GWAS signal and an eQTL signal share the same causal variant at a genomic locus. It uses the coloc.abf function to compute posterior probabilities across five hypotheses (no association, single-trait association, distinct causal variants, and shared causal variant), distinguishing true causal overlap from linkage disequilibrium-driven coincidence. Use when investigating whether a disease-associated variant identified in GWAS is the same as a variant affecting gene expression in eQTL studies.

View source Repository: OpenClaw-Medical-Skills

Install in Claude Code

Copy

git clone --depth 1 https://github.com/FreedomIntelligence/OpenClaw-Medical-Skills /tmp/bio-causal-genomics-colocalization-analysis && cp -r /tmp/bio-causal-genomics-colocalization-analysis/skills/bio-causal-genomics-colocalization-analysis ~/.claude/skills/bio-causal-genomics-colocalization-analysis

Then start a new Claude Code session; the skill loads automatically.

Definition

SKILL.md

## Version Compatibility

Reference examples tested with: ggplot2 3.5+

Before using code patterns, verify installed versions match. If versions differ:
- R: `packageVersion('<pkg>')` then `?function_name` to verify parameters

If code throws ImportError, AttributeError, or TypeError, introspect the installed
package and adapt the example to match the actual API rather than retrying.

# Colocalization Analysis

**"Test whether my GWAS signal and eQTL share the same causal variant"** → Compute Bayesian posterior probabilities for five colocalization hypotheses (no association, trait-1-only, trait-2-only, distinct causal variants, shared causal variant) to distinguish true causal overlap from LD-driven coincidence.
- R: `coloc::coloc.abf()` for approximate Bayes factor colocalization

## Overview

Colocalization tests whether two association signals at the same locus are driven by the
same causal variant. This distinguishes shared causality from coincidental overlap due to LD.

Five hypotheses tested by coloc:
- H0: No association with either trait
- H1: Association with trait 1 only
- H2: Association with trait 2 only
- H3: Both associated, different causal variants
- H4: Both associated, shared causal variant

## coloc.abf Analysis

**Goal:** Test whether two traits share a causal variant at a GWAS locus using Bayesian colocalization.

**Approach:** Format summary statistics for each trait as named lists, run coloc.abf to compute posterior probabilities for five hypotheses (H0-H4), and interpret PP.H4 as evidence for a shared causal variant.

```r
library(coloc)

# --- Input format: named list with GWAS summary stats ---
# Required fields: beta, varbeta, snp, position, type, N
# type = 'quant' (continuous) or 'cc' (case-control)

gwas_data <- list(
  beta = gwas_df$BETA,
  varbeta = gwas_df$SE^2,
  snp = gwas_df$SNP,
  position = gwas_df$POS,
  type = 'cc',           # Case-control study
  s = 0.3,               # Proportion of cases (required for cc)
  N = 50000              # Total sample size
)

eqtl_data <- list(
  beta = eqtl_df$BETA,
  varbeta = eqtl_df$SE^2,
  snp = eqtl_df$SNP,
  position = eqtl_df$POS,
  type = 'quant',        # Quantitative trait (expression)
  N = 500,               # eQTL sample size
  sdY = 1                # SD of trait (1 if already normalized)
)

# --- Run colocalization ---
result <- coloc.abf(dataset1 = gwas_data, dataset2 = eqtl_data)

# Posterior probabilities
# PP.H4 > 0.8: Strong evidence for colocalization (shared variant)
# PP.H3 > 0.8: Distinct causal variants at the locus
# PP.H4 between 0.5-0.8: Suggestive but inconclusive
print(result$summary)
```

## Prior Sensitivity

```r
# Default priors: p1 = 1e-4, p2 = 1e-4, p12 = 1e-5
# p1: Prior probability a SNP is associated with trait 1
# p2: Prior probability a SNP is associated with trait 2
# p12: Prior probability a SNP is associated with both traits
#
# Ratio p12/p1 represents prior belief in colocalization
# Default: p12/p1 = 0.1 (10% of trait 1 SNPs also affect trait 2)

result_sensitive <- coloc.abf(
  dataset1 = gwas_data,
  dataset2 = eqtl_data,
  p1 = 1e-4,
  p2 = 1e-4,
  p12 = 5e-6    # More conservative prior for shared association
)

# Sensitivity analysis across prior values
sensitivity(result, 'H4 > 0.8')
```

## Using P-values (No Beta/SE)

```r
# When only p-values are available, use MAF to approximate
gwas_pval <- list(
  pvalues = gwas_df$P,
  MAF = gwas_df$MAF,
  snp = gwas_df$SNP,
  position = gwas_df$POS,
  type = 'cc',
  s = 0.3,
  N = 50000
)

result <- coloc.abf(dataset1 = gwas_pval, dataset2 = eqtl_data)
```

## SuSiE-Coloc (Multiple Causal Variants)

**Goal:** Test colocalization at loci with multiple independent causal signals.

**Approach:** Run SuSiE fine-mapping on each dataset to identify credible sets, then test colocalization between all pairs of credible sets using coloc.susie.

```r
library(coloc)
library(susieR)

# coloc.abf assumes a single causal variant per locus
# SuSiE-coloc handles multiple causal variants

# LD matrix required (correlation matrix from reference panel)
ld_matrix <- as.matrix(read.table('ld_matrix.txt'))

# Run SuSiE on each dataset
susie_gwas <- runsusie(
  list(beta = gwas_df$BETA, varbeta = gwas_df$SE^2,
       snp = gwas_df$SNP, position = gwas_df$POS,
       type = 'cc', s = 0.3, N = 50000, LD = ld_matrix),
  L = 10       # Max number of causal variants to search for
)

susie_eqtl <- runsusie(
  list(beta = eqtl_df$BETA, varbeta = eqtl_df$SE^2,
       snp = eqtl_df$SNP, position = eqtl_df$POS,
       type = 'quant', N = 500, sdY = 1, LD = ld_matrix),
  L = 10
)

# Coloc using SuSiE credible sets
result_susie <- coloc.susie(susie_gwas, susie_eqtl)
print(result_susie$summary)
# Each row tests colocalization between a pair of credible sets
# hit1, hit2: Credible set indices from dataset 1 and 2
```

## HyPrColoc (Multi-Trait)

**Goal:** Test colocalization across three or more traits simultaneously to identify shared causal variant clusters.

**Approach:** Provide beta and SE matrices (SNPs x traits) to hyprcoloc, which clusters traits sharing a causal variant using a branch-and-bound algorithm.

```r
# install.packages('remotes')
# remotes::install_github('jrs95/hyprcoloc')
library(hyprcoloc)

# Test colocalization across multiple traits simultaneously
# Input: matrices of betas and SEs (rows = SNPs, columns = traits)
betas <- cbind(gwas_df$BETA, eqtl1_df$BETA, eqtl2_df$BETA)
ses <- cbind(gwas_df$SE, eqtl1_df$SE, eqtl2_df$SE)
colnames(betas) <- colnames(ses) <- c('GWAS', 'eQTL_gene1', 'eQTL_gene2')
rownames(betas) <- rownames(ses) <- gwas_df$SNP

result_hypr <- hyprcoloc(
  effect.est = betas,
  effect.se = ses,
  trait.names = colnames(betas),
  snp.id = rownames(betas)
)

# Output: clusters of traits sharing a causal variant
print(result_hypr$results)
```

## Input Preparation

```r
# --- Extract a locus (1 Mb window around lead SNP) ---
extract_locus <- function(sumstats, lead_snp_pos, chr, window = 500000) {
  locus <- sumstats[sumst

More from this repository

aav-vector-design-agentSkill

adaptyvSkill

Cloud laboratory platform for automated protein testing and validation. Use when designing proteins and needing experimental validation including binding assays, expression testing, thermostability measurements, enzyme activity assays, or protein sequence optimization. Also use for submitting experiments via API, tracking experiment status, downloading results, optimizing protein sequences for better expression using computational tools (NetSolP, SoluProt, SolubleMPNN, ESM), or managing protein design workflows with wet-lab validation.

adhd-daily-plannerSkill

Time-blind friendly planning, executive function support, and daily structure for ADHD brains. Specializes in realistic time estimation, dopamine-aware task design, and building systems that

aeonSkill

This skill should be used for time series machine learning tasks including classification, regression, clustering, forecasting, anomaly detection, segmentation, and similarity search. Use when working with temporal data, sequential patterns, or time-indexed observations requiring specialized algorithms beyond standard ML approaches. Particularly suited for univariate and multivariate time series analysis with scikit-learn compatible APIs.

agent-browserSkill

Browse the web for any task — research topics, read articles, interact with web apps, fill forms, take screenshots, extract data, and test web pages. Use whenever a browser would be useful, not just when the user explicitly asks.

agentd-drug-discoverySkill

ai-analyzerSkill

AI驱动的综合健康分析系统，整合多维度健康数据、识别异常模式、预测健康风险、提供个性化建议。支持智能问答和AI健康报告生成。

alphafold-databaseSkill

Access AlphaFold's 200M+ AI-predicted protein structures. Retrieve structures by UniProt ID, download PDB/mmCIF files, analyze confidence metrics (pLDDT, PAE), for drug discovery and structural biology.