bio-clinical-databases-gnomad-frequencies
This Claude Code skill queries the gnomAD REST API using GraphQL to retrieve population allele frequencies for genetic variants, extracting metrics including allele count, allele number, allele frequency, and homozygote counts for both exome and genome datasets. Use this skill when evaluating variant rarity for rare disease analysis, filtering variants by population frequency thresholds, or determining whether a variant is common in the general population.
git clone --depth 1 https://github.com/FreedomIntelligence/OpenClaw-Medical-Skills /tmp/bio-clinical-databases-gnomad-frequencies && cp -r /tmp/bio-clinical-databases-gnomad-frequencies/skills/bio-clinical-databases-gnomad-frequencies ~/.claude/skills/bio-clinical-databases-gnomad-frequenciesSKILL.md
## Version Compatibility
Reference examples tested with: requests 2.31+, pandas 2.2+
Before using code patterns, verify installed versions match. If versions differ:
- Python: `pip show <package>` then `help(module.function)` to check signatures
If code throws ImportError, AttributeError, or TypeError, introspect the installed
package and adapt the example to match the actual API rather than retrying.
# gnomAD Frequency Queries
## gnomAD REST API
**Goal:** Retrieve exome and genome allele frequencies from gnomAD for individual variants.
**Approach:** Send a GraphQL query to the gnomAD API with variant ID and dataset version, then parse exome/genome frequency fields.
**"Check how common this variant is in the population"** → Query gnomAD for allele frequency, allele count, and homozygote count.
- Python: GraphQL via `requests.post()` (requests)
- Python: `myvariant.MyVariantInfo().getvariant()` (myvariant)
### Query Single Variant
```python
import requests
def query_gnomad(chrom, pos, ref, alt, dataset='gnomad_r4'):
'''Query gnomAD API for variant frequency
dataset options: gnomad_r4, gnomad_r3, gnomad_r2_1
'''
url = 'https://gnomad.broadinstitute.org/api'
query = '''
query ($variantId: String!, $dataset: DatasetId!) {
variant(variantId: $variantId, dataset: $dataset) {
exome {
ac
an
af
homozygote_count
}
genome {
ac
an
af
homozygote_count
}
}
}
'''
variant_id = f'{chrom}-{pos}-{ref}-{alt}'
variables = {'variantId': variant_id, 'dataset': dataset}
response = requests.post(url, json={'query': query, 'variables': variables})
return response.json()
```
### Parse gnomAD Response
```python
def parse_gnomad_result(result):
'''Extract allele frequencies from gnomAD response'''
data = result.get('data', {}).get('variant', {})
if not data:
return None
exome = data.get('exome', {}) or {}
genome = data.get('genome', {}) or {}
return {
'exome_af': exome.get('af'),
'exome_ac': exome.get('ac'),
'exome_an': exome.get('an'),
'exome_hom': exome.get('homozygote_count'),
'genome_af': genome.get('af'),
'genome_ac': genome.get('ac'),
'genome_an': genome.get('an'),
'genome_hom': genome.get('homozygote_count')
}
```
## Query via myvariant.info
**Goal:** Retrieve gnomAD frequencies through the myvariant.info aggregation layer for simpler API access.
**Approach:** Query myvariant.info by HGVS notation with gnomAD fields specified, extracting exome and genome allele frequencies.
```python
import myvariant
mv = myvariant.MyVariantInfo()
def get_gnomad_via_myvariant(variant_hgvs):
'''Get gnomAD frequencies via myvariant.info'''
result = mv.getvariant(variant_hgvs, fields=['gnomad_exome', 'gnomad_genome'])
exome = result.get('gnomad_exome', {})
genome = result.get('gnomad_genome', {})
return {
'exome_af': exome.get('af', {}).get('af'),
'genome_af': genome.get('af', {}).get('af')
}
```
## Population-Specific Frequencies
**Goal:** Retrieve ancestry-specific allele frequencies to assess variant rarity within relevant populations.
**Approach:** Query the gnomAD population-stratified AF fields (AFR, AMR, ASJ, EAS, FIN, NFE, SAS) via myvariant.info.
```python
def get_population_frequencies(variant_hgvs):
'''Get gnomAD frequencies by ancestry population'''
mv = myvariant.MyVariantInfo()
result = mv.getvariant(variant_hgvs, fields=['gnomad_exome.af'])
af_data = result.get('gnomad_exome', {}).get('af', {})
populations = {
'af': af_data.get('af'), # Global
'af_afr': af_data.get('af_afr'), # African
'af_amr': af_data.get('af_amr'), # Admixed American
'af_asj': af_data.get('af_asj'), # Ashkenazi Jewish
'af_eas': af_data.get('af_eas'), # East Asian
'af_fin': af_data.get('af_fin'), # Finnish
'af_nfe': af_data.get('af_nfe'), # Non-Finnish European
'af_sas': af_data.get('af_sas'), # South Asian
}
return populations
```
## Filtering Thresholds
Common frequency cutoffs for variant filtering:
| Threshold | Use Case |
|-----------|----------|
| < 0.01 (1%) | Rare disease, ACMG PM2 |
| < 0.001 (0.1%) | Stringent rare disease |
| < 0.0001 (0.01%) | Ultra-rare |
| Absent | Novel variant |
## Filter Variants by Frequency
**Goal:** Apply population frequency thresholds to retain only rare variants for downstream analysis.
**Approach:** Compare the maximum allele frequency across exome and genome datasets against a configurable threshold (default 1% per ACMG PM2).
```python
def is_rare(gnomad_af, threshold=0.01):
'''Check if variant is rare based on gnomAD AF
threshold: Default 0.01 (1%) per ACMG PM2 supporting criterion
Use 0.001 for more stringent filtering
'''
if gnomad_af is None:
return True # Absent from gnomAD = rare
return gnomad_af < threshold
def filter_rare_variants(variants, threshold=0.01):
'''Filter list of variants to keep only rare ones'''
rare = []
for v in variants:
exome_af = v.get('gnomad_exome_af')
genome_af = v.get('gnomad_genome_af')
max_af = max(filter(None, [exome_af, genome_af]), default=None)
if is_rare(max_af, threshold):
rare.append(v)
return rare
```
## Batch Query with Local gnomAD
**Goal:** Perform large-scale frequency lookups using a local gnomAD Hail Table for high throughput.
**Approach:** Load the gnomAD sites Hail Table from Google Cloud Storage and filter by allele frequency threshold.
For large-scale analysis, use local gnomAD VCF/Hail Table:
```python
# Using Hail for gnomAD v4
import hail as hl
ht = hl.read_table('gs://gcp-public-data--gnomad/release/4.0/ht/exomes/gnomad.exomCloud laboratory platform for automated protein testing and validation. Use when designing proteins and needing experimental validation including binding assays, expression testing, thermostability measurements, enzyme activity assays, or protein sequence optimization. Also use for submitting experiments via API, tracking experiment status, downloading results, optimizing protein sequences for better expression using computational tools (NetSolP, SoluProt, SolubleMPNN, ESM), or managing protein design workflows with wet-lab validation.
Time-blind friendly planning, executive function support, and daily structure for ADHD brains. Specializes in realistic time estimation, dopamine-aware task design, and building systems that
This skill should be used for time series machine learning tasks including classification, regression, clustering, forecasting, anomaly detection, segmentation, and similarity search. Use when working with temporal data, sequential patterns, or time-indexed observations requiring specialized algorithms beyond standard ML approaches. Particularly suited for univariate and multivariate time series analysis with scikit-learn compatible APIs.
Browse the web for any task — research topics, read articles, interact with web apps, fill forms, take screenshots, extract data, and test web pages. Use whenever a browser would be useful, not just when the user explicitly asks.
AI驱动的综合健康分析系统,整合多维度健康数据、识别异常模式、预测健康风险、提供个性化建议。支持智能问答和AI健康报告生成。
Access AlphaFold's 200M+ AI-predicted protein structures. Retrieve structures by UniProt ID, download PDB/mmCIF files, analyze confidence metrics (pLDDT, PAE), for drug discovery and structural biology.