Skill1.6k repo starsupdated today

tooluniverse-gwas-snp-interpretation

This skill interprets individual GWAS variants by aggregating functional and association data across multiple databases including GWAS Catalog, LD structure, eQTL evidence, regulatory annotations, ClinVar, and population frequencies. Use it to determine mechanistic roles of SNPs, trace SNP-to-gene relationships, and distinguish lead variants from causal variants through linkage disequilibrium analysis and fine-mapping evidence.

View source Repository: ToolUniverse

Install in Claude Code

Copy

git clone --depth 1 https://github.com/mims-harvard/ToolUniverse /tmp/tooluniverse-gwas-snp-interpretation && cp -r /tmp/tooluniverse-gwas-snp-interpretation/plugin/skills/tooluniverse-gwas-snp-interpretation ~/.claude/skills/tooluniverse-gwas-snp-interpretation

Then start a new Claude Code session; the skill loads automatically.

Definition

SKILL.md

# GWAS SNP Interpretation Skill

**SNP interpretation**: a GWAS hit is a REGION, not a single causal variant. The lead SNP may not be causal — it may be in LD with the causal variant. Always check LD structure and functional annotation before concluding a specific SNP is mechanistically responsible. Use `LDlink_get_proxies(variant="rs...", population="EUR")` to retrieve the high-R² LD proxies (needs a free LDLINK_TOKEN) — a proxy in a coding/regulatory region is a better mechanistic candidate than the lead SNP itself. Fine-mapping (SuSiE, FINEMAP credible sets) narrows the causal set but rarely identifies a single variant with certainty. L2G scores integrate eQTL, chromatin interaction, and distance data to predict the causal gene — a lead SNP mapping to gene A may actually regulate gene B 500 kb away via a distal enhancer.

**LOOK UP DON'T GUESS**: never assume a SNP's functional consequence, mapped gene, or population frequency — always call `gwas_get_snp_by_id` and `OpenTargets_get_variant_info` to retrieve current annotations.

## Overview

Interpret genetic variants (SNPs) from GWAS studies by aggregating evidence from multiple sources to provide comprehensive clinical and biological context.

**Use Cases:**
- "Interpret rs7903146" (TCF7L2 diabetes variant)
- "What diseases is rs429358 associated with?" (APOE Alzheimer's variant)
- "Clinical significance of rs1801133" (MTHFR variant)
- "Is rs12913832 in any fine-mapped loci?" (Eye color variant)

## What It Does

The skill provides a comprehensive interpretation of SNPs by:

1. **SNP Annotation**: Retrieves basic variant information including genomic coordinates, alleles, functional consequence, and mapped genes
2. **Association Discovery**: Finds all GWAS trait/disease associations with statistical significance
3. **Fine-Mapping Evidence**: Identifies credible sets the variant belongs to (fine-mapped causal loci)
4. **Gene Mapping**: Uses Locus-to-Gene (L2G) predictions to identify likely causal genes
5. **Clinical Summary**: Aggregates evidence into actionable clinical significance

## Workflow

```
User Input: rs7903146
    ↓
[1] SNP Lookup
    → Get location, consequence, MAF
    → gwas_get_snp_by_id
    ↓
[2] Association Search
    → Find all trait/disease associations
    → gwas_get_associations_for_snp
    ↓
[3] Fine-Mapping (Optional)
    → Get credible set membership
    → OpenTargets_get_variant_credible_sets
    ↓
[4] Gene Predictions
    → Extract L2G scores for causal genes
    → (embedded in credible sets)
    ↓
[5] Clinical Summary
    → Aggregate evidence
    → Identify key traits and genes
    ↓
Output: Comprehensive Interpretation Report
```

## Data Sources

### GWAS Catalog (EMBL-EBI)
- **SNP annotations**: Functional consequences, mapped genes, population frequencies
- **Associations**: P-values, effect sizes, study metadata
- **Coverage**: 350,000+ publications, 670,000+ associations

### Open Targets Genetics
- **Fine-mapping**: Statistical credible sets from SuSiE, FINEMAP methods
- **L2G predictions**: Machine learning-based gene prioritization
- **Colocalization**: QTL evidence for causal genes
- **Coverage**: UK Biobank, FinnGen, and other large cohorts

## Input Parameters

### Required
- `rs_id` (str): dbSNP rs identifier
  - Format: "rs" + number (e.g., "rs7903146")
  - Must be valid rsID in GWAS Catalog

### Optional
- `include_credible_sets` (bool, default=True): Query fine-mapping data
  - True: Complete interpretation (slower, ~10-30s)
  - False: Fast associations only (~2-5s)
- `p_threshold` (float, default=5e-8): Genome-wide significance threshold
- `max_associations` (int, default=100): Maximum associations to retrieve

## Output Format

Returns `SNPInterpretationReport` containing:

### 1. SNP Basic Info
```python
{
    'rs_id': 'rs7903146',
    'chromosome': '10',
    'position': 112998590,
    'ref_allele': 'C',
    'alt_allele': 'T',
    'consequence': 'intron_variant',
    'mapped_genes': ['TCF7L2'],
    'maf': 0.293
}
```

### 2. Trait Associations
```python
[
    {
        'trait': 'Type 2 diabetes',
        'p_value': 1.2e-128,
        'beta': '0.28 unit increase',
        'study_id': 'GCST010555',
        'pubmed_id': '33536258',
        'effect_allele': 'T'
    },
    ...
]
```

### 3. Credible Sets (Fine-Mapping)
```python
[
    {
        'study_id': 'GCST90476118',
        'trait': 'Renal failure',
        'finemapping_method': 'SuSiE-inf',
        'p_value': 3.5e-42,
        'predicted_genes': [
            {'gene': 'TCF7L2', 'score': 0.863}
        ],
        'region': '10:112950000-113050000'
    },
    ...
]
```

### 4. Clinical Significance
```
Genome-wide significant associations with 100 traits/diseases:
  - Type 2 diabetes
  - Diabetic retinopathy
  - HbA1c levels
  ...

Identified in 20 fine-mapped loci.
Predicted causal genes: TCF7L2
```

## Example Usage

See `QUICK_START.md` for platform-specific examples.

## Tools Used

### GWAS Catalog Tools
1. `gwas_get_snp_by_id`: Get SNP annotation
2. `gwas_get_associations_for_snp`: Get all trait associations

### Open Targets Tools
3. `OpenTargets_get_variant_info`: Get variant details with population frequencies
4. `OpenTargets_get_variant_credible_sets`: Get fine-mapping credible sets with L2G

## Interpretation Guide

### P-value Significance Levels
- **p < 5e-8**: Genome-wide significant (strong evidence)
- **p < 5e-6**: Suggestive (moderate evidence)
- **p < 0.05**: Nominal (weak evidence)

### L2G Score Interpretation
- **> 0.5**: High confidence causal gene
- **0.1-0.5**: Moderate confidence
- **< 0.1**: Low confidence

### Clinical Actionability
1. **High**: Multiple genome-wide significant associations + in credible sets + high L2G scores
2. **Moderate**: Genome-wide significant associations but limited fine-mapping
3. **Low**: Suggestive associations or limited replication

## Limitations

1. **Variant ID Conversion**: OpenTargets requires chr_pos_ref_alt format, which may need allele lookup
2. **Population Specificity**

More from this repository

setup-tooluniverseSkill

Install and configure ToolUniverse for any use case — MCP server (chat-based), CLI (command line with 9 subcommands), or Python SDK (Coding API with 3 calling patterns). Covers uv/uvx setup, MCP configuration for 12+ AI clients (Cursor, Claude Desktop, Windsurf, VS Code, Codex, Gemini CLI, Trae, Cline, etc.), full CLI reference (tu list/grep/find/info/run/test/status/build/serve), Coding API quickstart, agentic tools, code executor, API key walkthrough, skill installation, and upgrading. Use when user asks how to set up ToolUniverse, which access mode to use (MCP vs CLI vs SDK), configuring MCP servers, using the CLI, troubleshooting installation, upgrading, or mentions installing ToolUniverse or setting up scientific tools. Also triggers for "how do I use ToolUniverse", "what's the best way to access tools", "command line", "tu command", "coding API", "tu build".

tooluniverse-acmg-variant-classificationSkill

Systematic ACMG/AMP germline variant classification with all 28 criteria (PVS1, PS1-4, PM1-6, PP1-5, BA1, BS1-4, BP1-7) for clinical significance. Produces 5-tier verdict (Pathogenic / Likely Pathogenic / VUS / Likely Benign / Benign) with cited evidence per criterion. Use for variant interpretation, VUS resolution, and pathogenicity assessment. Combines ClinVar, gnomAD, computational predictors, and gene-mechanism context.

tooluniverse-admet-predictionSkill

Comprehensive ADMET (Absorption, Distribution, Metabolism, Excretion, Toxicity) profiling for drug candidates. Integrates ADMET-AI predictions, SwissADME drug-likeness, PubChemTox experimental toxicity, ChEMBL clinical data, Lipinski rule-of-five, and CYP interaction data. Use for drug-likeness assessment, BBB penetration, bioavailability, hepatotoxicity prediction, ADME/PK profiling, or screening compound libraries before lab testing.

tooluniverse-adverse-event-detectionSkill

Detect and analyze adverse drug event signals using FDA FAERS reports, drug labels, and disproportionality statistics (PRR, ROR, IC). Generates quantitative safety signal scores (0-100) with evidence grading. Use for post-market surveillance, pharmacovigilance, drug safety assessment, regulatory submissions, and detecting rare AE signals not visible in clinical trials.

tooluniverse-adverse-outcome-pathwaySkill

Map environmental and industrial chemicals to adverse outcome pathways (AOPs) — molecular initiating event to organ-level toxicity. Uses AOPWiki, GHS classification, IARC carcinogen status, and LD50 data. Use for environmental/industrial chemical risk assessment, regulatory-grade hazard characterization, and AOP stressor mapping. Distinct from drug-safety analysis (use tooluniverse-pharmacovigilance for drugs).

tooluniverse-aging-senescenceSkill

Aging biology, cellular senescence, and longevity research. Covers senescence markers (p16/CDKN2A, SASP, SA-beta-gal), aging hallmarks, senolytic drug discovery (dasatinib+quercetin, fisetin, navitoclax), epigenetic clocks, telomere biology, and longevity GWAS. Use for senescence-pathway analysis, age-related disease genetics, senolytic-target discovery, and centenarian-genetics queries. Distinguishes correlative vs causal evidence (knockout, intervention).

tooluniverse-antibody-engineeringSkill

Therapeutic antibody engineering and optimization, lead-to-clinical-candidate. Covers sequence humanization (germline alignment, framework retention), affinity maturation, developability (aggregation, stability, PTMs), structure modeling (AlphaFold/PDB CDR analysis), immunogenicity prediction, and manufacturing feasibility. Use for biologic-drug optimization, mAb design review, biosimilar engineering, and clinical-precedent comparison.

tooluniverse-binder-discoverySkill

Discover novel small-molecule binders for protein targets using structure-based and ligand-based screening. Covers druggability assessment, known-ligand mining (ChEMBL, BindingDB), similarity expansion, ADMET filtering, and synthesis feasibility. Use for hit identification, virtual screening, target-to-compounds workflows, and lead-finding before commit-to-medchem.