Skip to main content
ClaudeWave
Skill1.4k estrellas del repoactualizado today

tooluniverse-gwas-finemapping

This skill identifies and prioritizes causal variants at GWAS loci using Bayesian statistical fine-mapping methods (SuSiE, FINEMAP) and locus-to-gene predictions. Use it when you need to distinguish causal variants from those merely in linkage disequilibrium, convert GWAS lead SNPs to probable target genes, or evaluate functional consequences and eQTL evidence at association signals.

Instalar en Claude Code
Copiar
git clone --depth 1 https://github.com/mims-harvard/ToolUniverse /tmp/tooluniverse-gwas-finemapping && cp -r /tmp/tooluniverse-gwas-finemapping/plugin/skills/tooluniverse-gwas-finemapping ~/.claude/skills/tooluniverse-gwas-finemapping
Después abre una sesión nueva de Claude Code; el skill carga automáticamente.

SKILL.md

## COMPUTE, DON'T DESCRIBE
When analysis requires computation (statistics, data processing, scoring, enrichment), write and run Python code via Bash. Don't describe what you would do — execute it and report actual results. Use ToolUniverse tools to retrieve data, then Python (pandas, scipy, statsmodels, matplotlib) to analyze it.

# GWAS Fine-Mapping & Causal Variant Prioritization

Identify and prioritize causal variants at GWAS loci using statistical fine-mapping and locus-to-gene predictions.

## Overview

Genome-wide association studies (GWAS) identify genomic regions associated with traits, but linkage disequilibrium (LD) makes it difficult to pinpoint the causal variant. **Fine-mapping** uses Bayesian statistical methods to compute the posterior probability that each variant is causal, given the GWAS summary statistics.

**REASONING STRATEGY — Start Here**:
Fine-mapping asks: which variant at this locus is CAUSAL? Work through this chain:
1. **LD structure first** — variants in high LD (r² > 0.8) cannot be statistically distinguished from each other. Look up the LD block via Open Targets or the GWAS Catalog before assuming any single variant is the cause.
2. **Functional annotation breaks LD ties** — if two variants have similar posterior probabilities but one is coding (missense, stop-gain) or sits in an active regulatory element (promoter, enhancer), that variant is biologically prioritized. Functional evidence is the tiebreaker.
3. **eQTL colocalization is the key bridge** — a variant that is also a significant eQTL for a nearby gene in the relevant tissue (e.g., a pancreatic islet eQTL for a T2D locus) has a mechanistic story. Look up eQTL evidence via Open Targets L2G scores; don't assume the nearest gene is the effector gene.

This skill provides tools to:
- **Prioritize causal variants** using fine-mapping posterior probabilities
- **Link variants to genes** using locus-to-gene (L2G) predictions
- **Annotate variants** with functional consequences
- **Suggest validation strategies** based on fine-mapping results

## Key Concepts

### Credible Sets
A **credible set** is a minimal set of variants that contains the causal variant with high confidence (typically 95% or 99%). Each variant in the set has a **posterior probability** of being causal, computed using methods like:
- **SuSiE** (Sum of Single Effects)
- **FINEMAP** (Bayesian fine-mapping)
- **PAINTOR** (Probabilistic Annotation INtegraTOR)

### Posterior Probability
The probability that a specific variant is causal, given the GWAS data and LD structure. Higher posterior probability = more likely to be causal.

### Locus-to-Gene (L2G) Predictions
L2G scores integrate multiple data types to predict which gene is affected by a variant:
- Distance to gene (closer = higher score)
- eQTL evidence (expression changes)
- Chromatin interactions (Hi-C, promoter capture)
- Functional annotations (coding variants, regulatory regions)

L2G scores range from 0 to 1, with higher scores indicating stronger gene-variant links.

## Fine-Mapping Reasoning Framework (CRITICAL)

**LOOK UP DON'T GUESS** -- never assume a lead SNP is the causal variant. Always check LD structure, credible sets, and functional annotations via the tools below.

### Step 1: Lead SNP vs Causal Variant

The lead SNP (most significant p-value) is often NOT the causal variant. It is simply the best-tagged variant on the genotyping array. The causal variant may be:
- In perfect LD (r2 > 0.95) with the lead SNP but with a functional consequence
- A non-coding regulatory variant not on the array
- One of several independent signals at the locus (conditional analysis reveals multiple)

**Action**: Always call `OpenTargets_get_variant_credible_sets` for the lead SNP. If the posterior probability is < 0.5, the lead SNP is likely NOT causal -- examine other variants in the credible set.

### Step 2: LD Structure Interpretation

LD blocks define the resolution limit of fine-mapping:
- **Tight LD block (few variants, r2 > 0.9)**: Credible set will be small; functional annotation is the tiebreaker
- **Broad LD block (many variants)**: Credible set is large; statistical fine-mapping alone is insufficient -- need functional data (eQTL, chromatin, CRISPR)
- **Population matters**: LD patterns differ between European, African, East Asian populations. African populations have shorter LD blocks and better fine-mapping resolution. Check which population the GWAS was conducted in.

### Step 3: Credible Set Analysis

When interpreting a credible set:
1. **Size matters**: A 95% credible set with 1-3 variants = high resolution. With 50+ variants = low resolution, need more data.
2. **Posterior probability distribution**: If one variant has PP > 0.5, it is the strong favorite. If PP is spread evenly across many variants, no single causal variant can be identified statistically.
3. **Multiple credible sets at one locus**: Indicates multiple independent causal signals (allelic heterogeneity). Each set represents a different causal mechanism.

### Step 4: Colocalization Reasoning

Colocalization asks: do two association signals (e.g., GWAS + eQTL) share the SAME causal variant?
- **High L2G score (> 0.7) + eQTL in relevant tissue**: Strong evidence the variant affects disease THROUGH gene expression changes
- **High GWAS signal but no eQTL**: Variant may act through protein-coding change, splicing, or a tissue/cell-type not yet profiled
- **eQTL for distant gene (not nearest)**: The effector gene is NOT the nearest gene. **LOOK UP** the L2G score -- do not default to nearest gene

### Step 5: Prioritization Tiebreakers

When multiple variants have similar posterior probabilities:
1. Coding variant (missense, stop-gain) > regulatory > intronic > intergenic
2. In active chromatin mark (H3K27ac, H3K4me1) in disease-relevant tissue
3. Disrupts transcription factor binding motif
4. Conserved across species (PhyloP, GERP)
5. eQTL in disease-relevant tissue with consistent direction of effect

##
setup-tooluniverseSkill

Install and configure ToolUniverse for any use case — MCP server (chat-based), CLI (command line with 9 subcommands), or Python SDK (Coding API with 3 calling patterns). Covers uv/uvx setup, MCP configuration for 12+ AI clients (Cursor, Claude Desktop, Windsurf, VS Code, Codex, Gemini CLI, Trae, Cline, etc.), full CLI reference (tu list/grep/find/info/run/test/status/build/serve), Coding API quickstart, agentic tools, code executor, API key walkthrough, skill installation, and upgrading. Use when user asks how to set up ToolUniverse, which access mode to use (MCP vs CLI vs SDK), configuring MCP servers, using the CLI, troubleshooting installation, upgrading, or mentions installing ToolUniverse or setting up scientific tools. Also triggers for "how do I use ToolUniverse", "what's the best way to access tools", "command line", "tu command", "coding API", "tu build".

tooluniverse-acmg-variant-classificationSkill

Systematic ACMG/AMP germline variant classification with all 28 criteria (PVS1, PS1-4, PM1-6, PP1-5, BA1, BS1-4, BP1-7) for clinical significance. Produces 5-tier verdict (Pathogenic / Likely Pathogenic / VUS / Likely Benign / Benign) with cited evidence per criterion. Use for variant interpretation, VUS resolution, and pathogenicity assessment. Combines ClinVar, gnomAD, computational predictors, and gene-mechanism context.

tooluniverse-admet-predictionSkill

Comprehensive ADMET (Absorption, Distribution, Metabolism, Excretion, Toxicity) profiling for drug candidates. Integrates ADMET-AI predictions, SwissADME drug-likeness, PubChemTox experimental toxicity, ChEMBL clinical data, Lipinski rule-of-five, and CYP interaction data. Use for drug-likeness assessment, BBB penetration, bioavailability, hepatotoxicity prediction, ADME/PK profiling, or screening compound libraries before lab testing.

tooluniverse-adverse-event-detectionSkill

Detect and analyze adverse drug event signals using FDA FAERS reports, drug labels, and disproportionality statistics (PRR, ROR, IC). Generates quantitative safety signal scores (0-100) with evidence grading. Use for post-market surveillance, pharmacovigilance, drug safety assessment, regulatory submissions, and detecting rare AE signals not visible in clinical trials.

tooluniverse-adverse-outcome-pathwaySkill

Map environmental and industrial chemicals to adverse outcome pathways (AOPs) — molecular initiating event to organ-level toxicity. Uses AOPWiki, GHS classification, IARC carcinogen status, and LD50 data. Use for environmental/industrial chemical risk assessment, regulatory-grade hazard characterization, and AOP stressor mapping. Distinct from drug-safety analysis (use tooluniverse-pharmacovigilance for drugs).

tooluniverse-aging-senescenceSkill

Aging biology, cellular senescence, and longevity research. Covers senescence markers (p16/CDKN2A, SASP, SA-beta-gal), aging hallmarks, senolytic drug discovery (dasatinib+quercetin, fisetin, navitoclax), epigenetic clocks, telomere biology, and longevity GWAS. Use for senescence-pathway analysis, age-related disease genetics, senolytic-target discovery, and centenarian-genetics queries. Distinguishes correlative vs causal evidence (knockout, intervention).

tooluniverse-antibody-engineeringSkill

Therapeutic antibody engineering and optimization, lead-to-clinical-candidate. Covers sequence humanization (germline alignment, framework retention), affinity maturation, developability (aggregation, stability, PTMs), structure modeling (AlphaFold/PDB CDR analysis), immunogenicity prediction, and manufacturing feasibility. Use for biologic-drug optimization, mAb design review, biosimilar engineering, and clinical-precedent comparison.

tooluniverse-binder-discoverySkill

Discover novel small-molecule binders for protein targets using structure-based and ligand-based screening. Covers druggability assessment, known-ligand mining (ChEMBL, BindingDB), similarity expansion, ADMET filtering, and synthesis feasibility. Use for hit identification, virtual screening, target-to-compounds workflows, and lead-finding before commit-to-medchem.