Skill2.9k repo starsupdated 7d ago

bio-causal-genomics-fine-mapping

This Claude Code skill provides fine-mapping functions to narrow genome-wide association study signals to candidate causal variants using SuSiE and FINEMAP algorithms. It calculates posterior inclusion probabilities and constructs credible sets that delineate which variants likely drive the association signal. Use this tool when you need to prioritize variants within a GWAS locus for functional validation or when building credible sets that contain causal variants with specified confidence levels.

View source Repository: OpenClaw-Medical-Skills

Install in Claude Code

Copy

git clone --depth 1 https://github.com/FreedomIntelligence/OpenClaw-Medical-Skills /tmp/bio-causal-genomics-fine-mapping && cp -r /tmp/bio-causal-genomics-fine-mapping/skills/bio-causal-genomics-fine-mapping ~/.claude/skills/bio-causal-genomics-fine-mapping

Then start a new Claude Code session; the skill loads automatically.

Definition

SKILL.md

## Version Compatibility

Reference examples tested with: ggplot2 3.5+

Before using code patterns, verify installed versions match. If versions differ:
- R: `packageVersion('<pkg>')` then `?function_name` to verify parameters
- CLI: `<tool> --version` then `<tool> --help` to confirm flags

If code throws ImportError, AttributeError, or TypeError, introspect the installed
package and adapt the example to match the actual API rather than retrying.

# Fine-Mapping

**"Narrow my GWAS locus to the likely causal variant"** → Compute posterior inclusion probabilities (PIPs) for each variant and construct credible sets containing the causal variant at a specified confidence level, accounting for LD and multiple causal signals.
- R: `susieR::susie_rss()` for SuSiE fine-mapping from summary statistics
- CLI: `finemap --sss` for shotgun stochastic search

## Overview

Fine-mapping narrows GWAS association signals to identify likely causal variants. Key outputs:

- **PIP** (Posterior Inclusion Probability) - Probability each variant is causal (0-1)
- **Credible set** - Minimal set of variants containing the causal variant at a given confidence level (e.g., 95%)
- **L** - Number of independent causal signals at the locus

## SuSiE (Sum of Single Effects)

**Goal:** Fine-map a GWAS locus to identify likely causal variants and credible sets from individual-level data.

**Approach:** Fit SuSiE's sum-of-single-effects model on the genotype matrix, then extract 95% credible sets (each containing the causal variant) and per-variant posterior inclusion probabilities.

```r
library(susieR)

# --- From individual-level data ---
# X: genotype matrix (n x p), standardized
# Y: phenotype vector (n x 1)
# L: max number of causal variants (10 is a reasonable default)
fit <- susie(X, Y, L = 10)

# Extract credible sets
# 95% credible sets: each set contains the causal variant with >= 95% probability
# coverage: minimum posterior mass for the credible set (default 0.95)
# min_abs_corr: minimum purity (correlation among variants in set; > 0.5 is good)
cs <- fit$sets$cs
cat('Number of credible sets:', length(cs), '\n')

# Credible set purity (minimum absolute correlation within set)
# Purity > 0.5: Well-resolved signal
# Purity < 0.5: Signal may be confounded by LD
purity <- fit$sets$purity
print(purity)

# PIPs for all variants
pip <- fit$pip
top_variants <- order(-pip)[1:10]
cat('\nTop 10 variants by PIP:\n')
for (i in top_variants) {
  cat(sprintf('  Variant %d: PIP = %.4f\n', i, pip[i]))
}
```

## SuSiE with Summary Statistics (susie_rss)

**Goal:** Fine-map a GWAS locus using summary statistics and an LD reference matrix (no individual-level data needed).

**Approach:** Compute Z-scores from beta/SE, provide a matched-ancestry LD correlation matrix, and run susie_rss to identify credible sets and PIPs.

```r
library(susieR)

# --- Most common usage: GWAS summary statistics + LD matrix ---
# z: Z-scores (beta / se) for each variant
# R: LD correlation matrix (from matched ancestry reference)
# n: Sample size
# L: Max causal variants

z_scores <- gwas_df$BETA / gwas_df$SE
ld_matrix <- as.matrix(read.table('ld_matrix.ld'))

# Ensure SNP order matches between z-scores and LD matrix
stopifnot(nrow(ld_matrix) == length(z_scores))

fit <- susie_rss(z = z_scores, R = ld_matrix, n = 50000, L = 10)

# Credible sets
cs <- fit$sets$cs
for (i in seq_along(cs)) {
  cat(sprintf('Credible set %d: %d variants, purity = %.3f\n',
              i, length(cs[[i]]), fit$sets$purity[i, 1]))
  cat('  Variants:', paste(gwas_df$SNP[cs[[i]]], collapse = ', '), '\n')
}

# PIPs
gwas_df$PIP <- fit$pip
top_pip <- gwas_df[order(-gwas_df$PIP), ][1:20, c('SNP', 'PIP', 'P')]
print(top_pip)
```

## Choosing L (Number of Causal Variants)

```r
# L = max number of causal signals SuSiE will search for
# Too low: Misses real signals
# Too high: Increases computation but rarely hurts results (SuSiE prunes excess)
#
# Guidelines:
# L = 1: Single causal variant expected
# L = 5: Most GWAS loci
# L = 10: Default, works well for most cases
# L = 20: Very complex loci (e.g., HLA region)

# Compare fits with different L
fit_l5 <- susie_rss(z = z_scores, R = ld_matrix, n = 50000, L = 5)
fit_l10 <- susie_rss(z = z_scores, R = ld_matrix, n = 50000, L = 10)

cat('L=5 credible sets:', length(fit_l5$sets$cs), '\n')
cat('L=10 credible sets:', length(fit_l10$sets$cs), '\n')
```

## LD Reference Panel

```bash
# Generate LD matrix from 1000 Genomes with plink
# Must match ancestry of GWAS sample

# Extract region
plink --bfile 1000G_EUR \
  --chr 6 --from-bp 30000000 --to-bp 31000000 \
  --make-bed --out locus_ref

# Compute correlation matrix
plink --bfile locus_ref \
  --r square --out ld_matrix

# Filter to GWAS SNPs only
plink --bfile locus_ref \
  --extract gwas_snps.txt \
  --r square --out ld_matrix_filtered
```

```r
# Read LD matrix into R
ld <- as.matrix(read.table('ld_matrix.ld'))

# Ensure positive semi-definite (numerical issues can violate this)
# Add small ridge to diagonal if needed
eigenvalues <- eigen(ld, only.values = TRUE)$values
if (any(eigenvalues < 0)) {
  ld <- ld + diag(abs(min(eigenvalues)) + 1e-6, nrow(ld))
}
```

## FINEMAP

**Goal:** Fine-map a locus using an alternative shotgun stochastic search algorithm.

**Approach:** Prepare .z (summary stats), .ld (LD matrix), and .master (config) files, then run FINEMAP to compute per-variant PIPs and causal configurations.

```bash
# FINEMAP: alternative fine-mapping tool using shotgun stochastic search
# Download from http://www.christianbenner.com/

# Required input files:
# 1. .z file: SNP, chromosome, position, allele1, allele2, MAF, beta, se
# 2. .ld file: LD matrix (space-separated, no header)
# 3. .master file: configuration

# Create master file
cat > master.txt << 'EOF'
z;ld;snp;config;cred;log;n_samples
locus.z;locus.ld;locus.snp;locus.config;locus.cred;locus.log;50000
EOF

# Run FINEMAP
finemap --sss --in-files master.txt --n-causal-snps 5
```

```r
# Parse FINEMAP output
finemap_snp <

More from this repository

aav-vector-design-agentSkill

adaptyvSkill

Cloud laboratory platform for automated protein testing and validation. Use when designing proteins and needing experimental validation including binding assays, expression testing, thermostability measurements, enzyme activity assays, or protein sequence optimization. Also use for submitting experiments via API, tracking experiment status, downloading results, optimizing protein sequences for better expression using computational tools (NetSolP, SoluProt, SolubleMPNN, ESM), or managing protein design workflows with wet-lab validation.

adhd-daily-plannerSkill

Time-blind friendly planning, executive function support, and daily structure for ADHD brains. Specializes in realistic time estimation, dopamine-aware task design, and building systems that

aeonSkill

This skill should be used for time series machine learning tasks including classification, regression, clustering, forecasting, anomaly detection, segmentation, and similarity search. Use when working with temporal data, sequential patterns, or time-indexed observations requiring specialized algorithms beyond standard ML approaches. Particularly suited for univariate and multivariate time series analysis with scikit-learn compatible APIs.

agent-browserSkill

Browse the web for any task — research topics, read articles, interact with web apps, fill forms, take screenshots, extract data, and test web pages. Use whenever a browser would be useful, not just when the user explicitly asks.

agentd-drug-discoverySkill

ai-analyzerSkill

AI驱动的综合健康分析系统，整合多维度健康数据、识别异常模式、预测健康风险、提供个性化建议。支持智能问答和AI健康报告生成。

alphafold-databaseSkill

Access AlphaFold's 200M+ AI-predicted protein structures. Retrieve structures by UniProt ID, download PDB/mmCIF files, analyze confidence metrics (pLDDT, PAE), for drug discovery and structural biology.