gwas-database
NHGRI-EBI GWAS Catalog REST API for SNP-trait associations from published GWAS. Query studies, associations, variants, traits, genes, summary stats. Build PRS candidates, analyze pleiotropy, fetch stats for Manhattan plots. No auth.
git clone --depth 1 https://github.com/jaechang-hits/SciAgent-Skills /tmp/gwas-database && cp -r /tmp/gwas-database/skills/genomics-bioinformatics/databases/gwas-database ~/.claude/skills/gwas-databaseSKILL.md
# GWAS Catalog Database — SNP-Trait Association Queries
## Overview
The NHGRI-EBI GWAS Catalog is a curated collection of published genome-wide association studies, mapping SNP-trait associations with genomic context. The REST API provides programmatic access to studies, associations, variants, traits, genes, and summary statistics. All responses are HAL+JSON with embedded `_links` for pagination.
## When to Use
- Finding genetic variants associated with a disease or trait (e.g., "which SNPs are linked to type 2 diabetes?")
- Retrieving genome-wide significant associations for a specific variant (rs ID)
- Exploring the genetic architecture of complex traits (number of loci, effect sizes)
- Checking variant pleiotropy (how many traits a single SNP affects)
- Downloading summary statistics for meta-analysis or polygenic risk score construction
- Identifying published GWAS studies by disease, gene, or PubMed ID
- Cross-referencing EFO trait ontology terms with GWAS evidence
- Building candidate gene lists from GWAS association regions
- For **drug target validation from GWAS hits**, use `opentargets-database` instead
- For **variant functional annotation** (consequence prediction, regulatory impact), use Ensembl VEP via `gget`
## Prerequisites
```bash
pip install requests matplotlib numpy
```
**API access**:
- **No authentication** required -- fully open access
- **Rate limits**: no official limit, but add `time.sleep(0.2)` between requests to be courteous
- **Base URL**: `https://www.ebi.ac.uk/gwas/rest/api`
- **Response format**: HAL+JSON with `_embedded` data and `_links` for pagination
- **Pagination**: default 20 results per page; max 500 via `size` parameter
## Quick Start
```python
import requests
import time
BASE = "https://www.ebi.ac.uk/gwas/rest/api"
def gwas_get(endpoint, params=None):
"""GWAS Catalog REST API helper with rate limiting and pagination support."""
url = f"{BASE}/{endpoint}"
resp = requests.get(url, params=params or {})
resp.raise_for_status()
time.sleep(0.2)
return resp.json()
# Find studies for a trait keyword. Study records have no top-level `title`
# — the publication title lives at `publicationInfo.title`; the trait label
# lives at `diseaseTrait.trait`.
data = gwas_get("studies/search/findByDiseaseTrait", {"diseaseTrait": "diabetes"})
studies = data["_embedded"]["studies"]
print(f"Found {len(studies)} studies for 'diabetes'")
for s in studies[:3]:
title = (s.get("publicationInfo") or {}).get("title", "N/A")
trait = (s.get("diseaseTrait") or {}).get("trait", "N/A")
print(f" {s['accessionId']} | {trait[:40]:<40} | {title[:60]}")
```
## Core API
### Module 1: Study Search
Search GWAS studies by disease trait keyword or PubMed ID.
```python
# Search studies by disease trait
data = gwas_get("studies/search/findByDiseaseTrait", {"diseaseTrait": "breast cancer"})
studies = data["_embedded"]["studies"]
for s in studies[:5]:
pi = s.get("publicationInfo") or {}
print(f" {s['accessionId']} | PMID:{pi.get('pubmedId','N/A')} | {pi.get('title','')[:60]}")
time.sleep(0.2)
# Search by PubMed ID. NOTE: the older `findByPubmedId` 404s on /studies/;
# the working endpoint is `findByPublicationIdPubmedId`.
data = gwas_get("studies/search/findByPublicationIdPubmedId", {"pubmedId": "25673413"})
studies = data["_embedded"]["studies"]
print(f"Studies from PMID 25673413: {len(studies)}")
for s in studies:
trait = (s.get("diseaseTrait") or {}).get("trait", "N/A")
print(f" {s['accessionId']}: {trait}")
```
### Module 2: Association Queries
Retrieve SNP-trait associations filtered by trait (EFO term), variant, or p-value.
```python
# Associations by EFO trait. The old path `efoTraits/{shortForm}/associations`
# also works *if* you have the current shortForm — but trait shortForms have
# been re-mapped to MONDO (e.g. EFO_0000249 → MONDO_0004975). The most reliable
# path is `associations/search/findByEfoTrait?efoTrait=<canonical trait name>`.
data = gwas_get("associations/search/findByEfoTrait",
{"efoTrait": "type 2 diabetes mellitus", "size": 50})
assocs = data["_embedded"]["associations"]
print(f"Associations for 'type 2 diabetes mellitus': {len(assocs)}")
for a in assocs[:5]:
pval = a.get("pvalue", None)
genes = []
for locus in a.get("loci", []) or []:
for gene in locus.get("authorReportedGenes", []) or []:
genes.append(gene.get("geneName", ""))
loci = a.get("loci") or [{}]
snps = [r.get("snps", [{}])[0].get("rsId", "N/A")
for r in (loci[0].get("strongestRiskAlleles") or [])]
print(f" rs={snps} | p={pval} | genes={genes}")
```
```python
# Associations for a specific variant. NOTE: association records do not embed
# `efoTraits` inline — they expose them via the `_links.efoTraits.href`
# HAL link. Follow the link (cached if needed) to resolve trait names.
data = gwas_get("singleNucleotidePolymorphisms/rs7903146/associations", {"size": 5})
assocs = data["_embedded"]["associations"]
print(f"Associations for rs7903146 (first page): {len(assocs)}")
def association_traits(assoc):
"""Resolve efoTraits via the HAL link on an association record."""
href = (assoc.get("_links") or {}).get("efoTraits", {}).get("href")
if not href:
return []
r = requests.get(href, timeout=15)
if not r.ok:
return []
return [t.get("trait") for t in r.json().get("_embedded", {}).get("efoTraits", [])]
for a in assocs[:5]:
traits = association_traits(a)
print(f" p={a.get('pvalue')} | OR={a.get('orPerCopyNum', 'N/A')} | traits={traits}")
time.sleep(0.1)
```
### Module 3: Variant Lookup
Query variant details by rsID, chromosomal region, or cytogenetic band.
```python
# Lookup single variant
data = gwas_get("singleNucleotidePolymorphisms/rs7903146")
loc = data.get("locations", [{}])[0]
print(f"rs7903146: chr{loc.get('chromosomeName', '?')}:{loc.get('chromosomePosition', '?')}")
print(f" Functional class: {data.get('f|
Opentrons Protocol API v2 for OT-2/Flex: Python protocols for pipetting, serial dilutions, PCR, plate replication; control thermocycler, heater-shaker, magnetic, temperature modules. Use pylabrobot for multi-vendor.
Interactive visualization with Plotly. 40+ chart types (scatter, line, heatmap, 3D, geographic) with hover, zoom, pan. Two APIs: Plotly Express (DataFrame) and Graph Objects (fine control). For static publication figures use matplotlib; for statistical grammar use seaborn.
Statistical visualization on matplotlib + pandas. Distributions (histplot, kdeplot, violin, box), relational (scatter, line), categorical, regression, correlation heatmaps. Auto aggregation/CIs. Use plotly for interactive; matplotlib for low-level.
Best practices for single-cell RNA-seq cell type annotation including marker-based, reference-based, and automated classification approaches.
Bayesian modeling with PyMC 5: priors, likelihood, NUTS/ADVI sampling, diagnostics (R-hat, ESS), LOO/WAIC comparison, prediction. Hierarchical, logistic, GP variants; predictive checks.
Time-to-event modeling with scikit-survival: Cox PH (elastic net), Random Survival Forests, Boosting, SVMs for censored data. C-index, Brier, time-dependent AUC; Kaplan-Meier, Nelson-Aalen, competing risks. Pipeline/GridSearchCV compatible. Use statsmodels for frequentist, pymc for Bayesian, lifelines for parametric.
>-