gnomad-database
gnomAD v4 population variant frequencies via GraphQL API. Allele counts and frequencies stratified by ancestry (AFR, AMR, EAS, NFE, SAS, FIN, ASJ, MID), gene-level constraint (pLI, LOEUF, missense z), and coverage. Identify rare or constrained variants. For clinical pathogenicity use clinvar-database; for GWAS use gwas-database.
git clone --depth 1 https://github.com/jaechang-hits/SciAgent-Skills /tmp/gnomad-database && cp -r /tmp/gnomad-database/skills/genomics-bioinformatics/databases/gnomad-database ~/.claude/skills/gnomad-databaseSKILL.md
# gnomAD Database
## Overview
The Genome Aggregation Database (gnomAD) is a resource of aggregated exome and genome sequencing data from 730,000+ individuals. It provides population variant frequencies stratified by 9 ancestry groups, gene-level constraint scores (pLI, LOEUF), and read coverage information. Access is free via a GraphQL API at `https://gnomad.broadinstitute.org/api` — no authentication required, no official SDK.
## When to Use
- Checking whether a candidate variant is rare enough to be clinically relevant (AF < 0.1% in all populations)
- Retrieving allele frequencies stratified by ancestry group (AFR, AMR, EAS, NFE, SAS, FIN, ASJ, MID) for a variant
- Identifying all rare loss-of-function variants in a gene for burden testing or candidate prioritization
- Getting gene constraint metrics (pLI, LOEUF) to assess tolerance to loss-of-function variants
- Checking read depth coverage for a region to evaluate if low variant frequency reflects low sequencing coverage
- Filtering a VCF by population frequency — query gnomAD AF to discard common variants before clinical interpretation
- For clinical pathogenicity classifications use `clinvar-database`; gnomAD provides frequency evidence but does not classify pathogenicity
- For GWAS associations at the study level use `gwas-database`; gnomAD is for population frequency lookups
## Prerequisites
- **Python packages**: `requests`, `pandas`, `matplotlib`
- **Data requirements**: gene symbols (e.g., `BRCA1`), variant IDs (`1-69511-A-G` format, or rsIDs)
- **Environment**: internet connection; no API key required
- **Rate limits**: no official published limits; use `time.sleep(0.5)` between requests for polite access; avoid bursts over 10 requests/second
```bash
pip install requests pandas matplotlib
```
## Quick Start
```python
import requests
import time
GNOMAD_API = "https://gnomad.broadinstitute.org/api"
def gnomad_query(query: str, variables: dict = None) -> dict:
"""Execute a gnomAD GraphQL query and return the data payload."""
payload = {"query": query, "variables": variables or {}}
r = requests.post(GNOMAD_API, json=payload, timeout=30)
r.raise_for_status()
result = r.json()
if "errors" in result:
raise ValueError(f"GraphQL errors: {result['errors']}")
return result["data"]
# Quick check: get pLI / LOEUF for BRCA1
# GnomadConstraint fields are FLAT (no nested `lof { oe_ci { upper } }` type).
# `pli` is the current field; `pLI` is preserved as a deprecated alias.
query = """
query GeneConstraint($gene_symbol: String!, $reference_genome: ReferenceGenomeId!) {
gene(gene_symbol: $gene_symbol, reference_genome: $reference_genome) {
gnomad_constraint { pli oe_lof_upper }
}
}
"""
data = gnomad_query(query, {"gene_symbol": "BRCA1", "reference_genome": "GRCh38"})
constraint = data["gene"]["gnomad_constraint"]
print(f"BRCA1 pLI: {constraint['pli']:.3e}") # ~5.5e-38 (very high LoF-intolerant)
print(f"BRCA1 LOEUF: {constraint['oe_lof_upper']:.3f}") # 0.928
```
## Core API
### Query 1: Gene Variant Query
Fetch all variants in a gene with population allele frequencies. Returns a list of variants with their genome-level frequencies.
```python
import requests, time
GNOMAD_API = "https://gnomad.broadinstitute.org/api"
def gnomad_query(query, variables=None):
r = requests.post(GNOMAD_API, json={"query": query, "variables": variables or {}}, timeout=30)
r.raise_for_status()
result = r.json()
if "errors" in result:
raise ValueError(f"GraphQL errors: {result['errors']}")
return result["data"]
GENE_VARIANTS_QUERY = """
query GeneVariants($gene_symbol: String!, $reference_genome: ReferenceGenomeId!, $dataset: DatasetId!) {
gene(gene_symbol: $gene_symbol, reference_genome: $reference_genome) {
gene_id
symbol
variants(dataset: $dataset) {
variant_id
rsids
chrom
pos
ref
alt
consequence
lof
genome {
an
ac
af
faf95 { popmax popmax_population }
}
}
}
}
"""
data = gnomad_query(GENE_VARIANTS_QUERY, {
"gene_symbol": "PCSK9",
"reference_genome": "GRCh38",
"dataset": "gnomad_r4"
})
variants = data["gene"]["variants"]
print(f"Gene: {data['gene']['symbol']} ({data['gene']['gene_id']})")
print(f"Total variants: {len(variants)}")
# Filter to rare variants (AF < 0.001)
rare = [v for v in variants if v["genome"] and v["genome"]["af"] is not None and v["genome"]["af"] < 0.001]
print(f"Rare variants (AF < 0.1%): {len(rare)}")
for v in rare[:3]:
print(f" {v['variant_id']} | {v['consequence']} | AF={v['genome']['af']:.2e}")
```
### Query 2: Variant Lookup
Fetch detailed information for a single variant by its gnomAD variant ID (CHROM-POS-REF-ALT format) or search by rsID.
```python
VARIANT_QUERY = """
query VariantDetails($variantId: String!, $dataset: DatasetId!) {
variant(variantId: $variantId, dataset: $dataset) {
variant_id
rsids
chrom
pos
ref
alt
transcript_consequences {
gene_symbol
transcript_id
is_canonical
major_consequence
lof
lof_filter
lof_flags
}
genome {
an
ac
af
faf95 { popmax popmax_population }
populations { id ac an homozygote_count }
}
}
}
"""
# Query.variant() arg is `variantId` (camelCase). The top-level deprecated
# `consequence`/`lof`/`lof_filter`/`lof_flags` fields on VariantDetails were
# removed — read them from `transcript_consequences` (plural list; pick the
# canonical transcript with is_canonical=True).
data = gnomad_query(VARIANT_QUERY, {
"variantId": "1-55039974-G-T", # PCSK9 p.Tyr142Ter (LoF)
"dataset": "gnomad_r4"
})
v = data["variant"]
canon = next((t for t in (v.get("transcript_consequences") or []) if t.get("is_canonical")),
(v.get("transcript_consequences") or [{}])[0])
print(f"Variant : {v['variant_id']}")
print(f"rsIDs : {v['rsids']}")
print(f"Gene|
Opentrons Protocol API v2 for OT-2/Flex: Python protocols for pipetting, serial dilutions, PCR, plate replication; control thermocycler, heater-shaker, magnetic, temperature modules. Use pylabrobot for multi-vendor.
Interactive visualization with Plotly. 40+ chart types (scatter, line, heatmap, 3D, geographic) with hover, zoom, pan. Two APIs: Plotly Express (DataFrame) and Graph Objects (fine control). For static publication figures use matplotlib; for statistical grammar use seaborn.
Statistical visualization on matplotlib + pandas. Distributions (histplot, kdeplot, violin, box), relational (scatter, line), categorical, regression, correlation heatmaps. Auto aggregation/CIs. Use plotly for interactive; matplotlib for low-level.
Best practices for single-cell RNA-seq cell type annotation including marker-based, reference-based, and automated classification approaches.
Bayesian modeling with PyMC 5: priors, likelihood, NUTS/ADVI sampling, diagnostics (R-hat, ESS), LOO/WAIC comparison, prediction. Hierarchical, logistic, GP variants; predictive checks.
Time-to-event modeling with scikit-survival: Cox PH (elastic net), Random Survival Forests, Boosting, SVMs for censored data. C-index, Brier, time-dependent AUC; Kaplan-Meier, Nelson-Aalen, competing risks. Pipeline/GridSearchCV compatible. Use statsmodels for frequentist, pymc for Bayesian, lifelines for parametric.
>-