Skip to main content
ClaudeWave
Skill199 estrellas del repoactualizado 16d ago

gwas-database

NHGRI-EBI GWAS Catalog REST API for SNP-trait associations from published GWAS. Query studies, associations, variants, traits, genes, summary stats. Build PRS candidates, analyze pleiotropy, fetch stats for Manhattan plots. No auth.

Instalar en Claude Code
Copiar
git clone --depth 1 https://github.com/jaechang-hits/SciAgent-Skills /tmp/gwas-database && cp -r /tmp/gwas-database/skills/genomics-bioinformatics/databases/gwas-database ~/.claude/skills/gwas-database
Después abre una sesión nueva de Claude Code; el skill carga automáticamente.

SKILL.md

# GWAS Catalog Database — SNP-Trait Association Queries

## Overview

The NHGRI-EBI GWAS Catalog is a curated collection of published genome-wide association studies, mapping SNP-trait associations with genomic context. The REST API provides programmatic access to studies, associations, variants, traits, genes, and summary statistics. All responses are HAL+JSON with embedded `_links` for pagination.

## When to Use

- Finding genetic variants associated with a disease or trait (e.g., "which SNPs are linked to type 2 diabetes?")
- Retrieving genome-wide significant associations for a specific variant (rs ID)
- Exploring the genetic architecture of complex traits (number of loci, effect sizes)
- Checking variant pleiotropy (how many traits a single SNP affects)
- Downloading summary statistics for meta-analysis or polygenic risk score construction
- Identifying published GWAS studies by disease, gene, or PubMed ID
- Cross-referencing EFO trait ontology terms with GWAS evidence
- Building candidate gene lists from GWAS association regions
- For **drug target validation from GWAS hits**, use `opentargets-database` instead
- For **variant functional annotation** (consequence prediction, regulatory impact), use Ensembl VEP via `gget`

## Prerequisites

```bash
pip install requests matplotlib numpy
```

**API access**:
- **No authentication** required -- fully open access
- **Rate limits**: no official limit, but add `time.sleep(0.2)` between requests to be courteous
- **Base URL**: `https://www.ebi.ac.uk/gwas/rest/api`
- **Response format**: HAL+JSON with `_embedded` data and `_links` for pagination
- **Pagination**: default 20 results per page; max 500 via `size` parameter

## Quick Start

```python
import requests
import time

BASE = "https://www.ebi.ac.uk/gwas/rest/api"

def gwas_get(endpoint, params=None):
    """GWAS Catalog REST API helper with rate limiting and pagination support."""
    url = f"{BASE}/{endpoint}"
    resp = requests.get(url, params=params or {})
    resp.raise_for_status()
    time.sleep(0.2)
    return resp.json()

# Find studies for a trait keyword. Study records have no top-level `title`
# — the publication title lives at `publicationInfo.title`; the trait label
# lives at `diseaseTrait.trait`.
data = gwas_get("studies/search/findByDiseaseTrait", {"diseaseTrait": "diabetes"})
studies = data["_embedded"]["studies"]
print(f"Found {len(studies)} studies for 'diabetes'")
for s in studies[:3]:
    title = (s.get("publicationInfo") or {}).get("title", "N/A")
    trait = (s.get("diseaseTrait") or {}).get("trait", "N/A")
    print(f"  {s['accessionId']} | {trait[:40]:<40} | {title[:60]}")
```

## Core API

### Module 1: Study Search

Search GWAS studies by disease trait keyword or PubMed ID.

```python
# Search studies by disease trait
data = gwas_get("studies/search/findByDiseaseTrait", {"diseaseTrait": "breast cancer"})
studies = data["_embedded"]["studies"]
for s in studies[:5]:
    pi = s.get("publicationInfo") or {}
    print(f"  {s['accessionId']} | PMID:{pi.get('pubmedId','N/A')} | {pi.get('title','')[:60]}")

time.sleep(0.2)

# Search by PubMed ID. NOTE: the older `findByPubmedId` 404s on /studies/;
# the working endpoint is `findByPublicationIdPubmedId`.
data = gwas_get("studies/search/findByPublicationIdPubmedId", {"pubmedId": "25673413"})
studies = data["_embedded"]["studies"]
print(f"Studies from PMID 25673413: {len(studies)}")
for s in studies:
    trait = (s.get("diseaseTrait") or {}).get("trait", "N/A")
    print(f"  {s['accessionId']}: {trait}")
```

### Module 2: Association Queries

Retrieve SNP-trait associations filtered by trait (EFO term), variant, or p-value.

```python
# Associations by EFO trait. The old path `efoTraits/{shortForm}/associations`
# also works *if* you have the current shortForm — but trait shortForms have
# been re-mapped to MONDO (e.g. EFO_0000249 → MONDO_0004975). The most reliable
# path is `associations/search/findByEfoTrait?efoTrait=<canonical trait name>`.
data = gwas_get("associations/search/findByEfoTrait",
                {"efoTrait": "type 2 diabetes mellitus", "size": 50})
assocs = data["_embedded"]["associations"]
print(f"Associations for 'type 2 diabetes mellitus': {len(assocs)}")

for a in assocs[:5]:
    pval = a.get("pvalue", None)
    genes = []
    for locus in a.get("loci", []) or []:
        for gene in locus.get("authorReportedGenes", []) or []:
            genes.append(gene.get("geneName", ""))
    loci = a.get("loci") or [{}]
    snps = [r.get("snps", [{}])[0].get("rsId", "N/A")
            for r in (loci[0].get("strongestRiskAlleles") or [])]
    print(f"  rs={snps} | p={pval} | genes={genes}")
```

```python
# Associations for a specific variant. NOTE: association records do not embed
# `efoTraits` inline — they expose them via the `_links.efoTraits.href`
# HAL link. Follow the link (cached if needed) to resolve trait names.
data = gwas_get("singleNucleotidePolymorphisms/rs7903146/associations", {"size": 5})
assocs = data["_embedded"]["associations"]
print(f"Associations for rs7903146 (first page): {len(assocs)}")

def association_traits(assoc):
    """Resolve efoTraits via the HAL link on an association record."""
    href = (assoc.get("_links") or {}).get("efoTraits", {}).get("href")
    if not href:
        return []
    r = requests.get(href, timeout=15)
    if not r.ok:
        return []
    return [t.get("trait") for t in r.json().get("_embedded", {}).get("efoTraits", [])]

for a in assocs[:5]:
    traits = association_traits(a)
    print(f"  p={a.get('pvalue')} | OR={a.get('orPerCopyNum', 'N/A')} | traits={traits}")
    time.sleep(0.1)
```

### Module 3: Variant Lookup

Query variant details by rsID, chromosomal region, or cytogenetic band.

```python
# Lookup single variant
data = gwas_get("singleNucleotidePolymorphisms/rs7903146")
loc = data.get("locations", [{}])[0]
print(f"rs7903146: chr{loc.get('chromosomeName', '?')}:{loc.get('chromosomePosition', '?')}")
print(f"  Functional class: {data.get('f
sciagent-skill-creatorSkill

|

opentrons-integrationSkill

Opentrons Protocol API v2 for OT-2/Flex: Python protocols for pipetting, serial dilutions, PCR, plate replication; control thermocycler, heater-shaker, magnetic, temperature modules. Use pylabrobot for multi-vendor.

plotly-interactive-visualizationSkill

Interactive visualization with Plotly. 40+ chart types (scatter, line, heatmap, 3D, geographic) with hover, zoom, pan. Two APIs: Plotly Express (DataFrame) and Graph Objects (fine control). For static publication figures use matplotlib; for statistical grammar use seaborn.

seaborn-statistical-visualizationSkill

Statistical visualization on matplotlib + pandas. Distributions (histplot, kdeplot, violin, box), relational (scatter, line), categorical, regression, correlation heatmaps. Auto aggregation/CIs. Use plotly for interactive; matplotlib for low-level.

single-cell-annotationSkill

Best practices for single-cell RNA-seq cell type annotation including marker-based, reference-based, and automated classification approaches.

pymc-bayesian-modelingSkill

Bayesian modeling with PyMC 5: priors, likelihood, NUTS/ADVI sampling, diagnostics (R-hat, ESS), LOO/WAIC comparison, prediction. Hierarchical, logistic, GP variants; predictive checks.

scikit-survival-analysisSkill

Time-to-event modeling with scikit-survival: Cox PH (elastic net), Random Survival Forests, Boosting, SVMs for censored data. C-index, Brier, time-dependent AUC; Kaplan-Meier, Nelson-Aalen, competing risks. Pipeline/GridSearchCV compatible. Use statsmodels for frequentist, pymc for Bayesian, lifelines for parametric.

statistical-analysisSkill

>-