regulomedb-database
Query RegulomeDB v2 GET REST API to score variants for regulatory function and retrieve overlapping evidence (TF binding, histone marks, DNase peaks, footprints, motifs, eQTLs, chromatin state). Scores range 1a (strongest) to 7 (none). Use for GWAS hit prioritization, regulatory variant annotation, cis-regulatory discovery. Use clinvar-database for pathogenicity; gwas-database for trait associations.
git clone --depth 1 https://github.com/jaechang-hits/SciAgent-Skills /tmp/regulomedb-database && cp -r /tmp/regulomedb-database/skills/genomics-bioinformatics/databases/regulomedb-database ~/.claude/skills/regulomedb-databaseSKILL.md
# RegulomeDB Database
## Overview
RegulomeDB integrates large-scale functional genomics data (ENCODE, Roadmap Epigenomics) to score genetic variants for regulatory potential. Each variant receives a ranking from 1a (highest regulatory confidence: eQTL + TF + DNase + motif + chromatin) to 7 (no known regulatory function). The v2 API is exposed as **GET** `https://regulomedb.org/regulome-search/`; the legacy POST `/regulome-search/`, POST `/regulome-summary/`, and GET `/regulome-datasets/` JSON endpoints are no longer functional (return `regulome-notfound` stubs or 500). Access is free and requires no authentication.
## When to Use
- Prioritizing GWAS hits for regulatory follow-up — identify which SNPs land in active regulatory elements
- Annotating a VCF or variant list with regulatory scores to filter to functionally relevant variants
- Identifying which transcription factors bind near a variant of interest (via the `@graph` evidence rows)
- Checking whether a non-coding variant overlaps a QTL and active chromatin simultaneously (`features.QTL`)
- Retrieving all annotated rsIDs in a genomic region for cis-regulatory analysis (region query with `nearby_snps`)
- Use `clinvar-database` instead when you need clinical pathogenicity classifications; RegulomeDB scores regulatory function, not germline disease association
- Use `gwas-database` instead when you want published GWAS associations with traits
## Prerequisites
- **Python packages**: `requests`, `pandas`, `matplotlib`
- **Data requirements**: rsIDs (e.g., `rs4946036`), genomic positions (`chr1:1000000`), or region coordinates (`chr1:1000000-2000000`)
- **Genome build**: GRCh38 (default) or GRCh37; specify in all requests
- **Rate limits**: No published rate limits; use `time.sleep(0.3)` between requests in batch workflows
```bash
pip install requests pandas matplotlib
```
## Quick Start
```python
import requests
BASE = "https://regulomedb.org"
def regulome_score(variant, genome="GRCh38"):
"""Score a single variant (rsID or chr:pos-pos) via the GET /regulome-search/ endpoint."""
r = requests.get(
f"{BASE}/regulome-search/",
params={"regions": variant, "genome": genome, "format": "json"},
timeout=30,
)
r.raise_for_status()
d = r.json()
rs = d.get("regulome_score", {})
vs = d.get("variants", [])
return {
"query": variant,
"ranking": rs.get("ranking"), # 1a / 1b / ... / 7
"probability": float(rs.get("probability", 0)),
"rsids": vs[0].get("rsids") if vs else [],
"chrom": vs[0].get("chrom") if vs else None,
"pos": vs[0].get("start") if vs else None,
}
print(regulome_score("rs4946036"))
# {'query': 'rs4946036', 'ranking': '7', 'probability': 0.18412,
# 'rsids': ['rs4946036'], 'chrom': 'chr6', 'pos': 114819799}
```
## Core API
### Query 1: Score a Single Variant (rsID or position)
The GET `/regulome-search/` endpoint accepts an rsID or coordinate as `regions=`. Returns a `regulome_score` block (probability, ranking, tissue-specific scores) plus `features` flags and the per-dataset `@graph` evidence rows.
```python
import requests
BASE = "https://regulomedb.org"
def score_variant(variant, genome="GRCh38"):
"""Return the regulome_score block and resolved coordinates."""
r = requests.get(
f"{BASE}/regulome-search/",
params={"regions": variant, "genome": genome, "format": "json"},
timeout=30,
)
r.raise_for_status()
d = r.json()
rs = d.get("regulome_score", {})
vs = d.get("variants", [])
feats = d.get("features", {})
print(f"Variant : {variant}")
print(f"Resolved : {vs[0]['chrom']}:{vs[0]['start']} ({', '.join(vs[0].get('rsids', []))})")
print(f"Ranking : {rs.get('ranking')} prob={rs.get('probability')}")
print(f"Features : ChIP={feats['ChIP']} Chromatin_accessibility={feats['Chromatin_accessibility']} "
f"QTL={feats['QTL']} Footprint={feats['Footprint']} PWM_matched={feats['PWM_matched']}")
return d
# Strong-regulatory locus example
score_variant("chr11:5226739-5226740")
# Ranking: 1a (HBB beta-globin promoter, multi-evidence)
```
```python
# Score by chromosomal position alone
score_variant("chr17:7670000-7670001") # TP53 region
```
### Query 2: Region Scan — List Annotated Variants in a Window
A range query returns up to `limit` resolved variants (`variants[]`) and all `@graph` evidence rows in the window, plus `nearby_snps` (rsIDs adjacent to the resolved hits).
```python
import requests, pandas as pd
BASE = "https://regulomedb.org"
def scan_region(chrom, start, end, genome="GRCh38", limit=200):
"""List variants in a region with their resolved positions and overlapping rsIDs."""
r = requests.get(
f"{BASE}/regulome-search/",
params={"regions": f"{chrom}:{start}-{end}", "genome": genome,
"format": "json", "limit": limit},
timeout=60,
)
r.raise_for_status()
d = r.json()
variants = d.get("variants", [])
print(f"Variants in {chrom}:{start}-{end}: {len(variants)} (total indexed = {d.get('total')})")
rows = [{"rsids": ", ".join(v.get("rsids", [])),
"chrom": v.get("chrom"),
"start": v.get("start"),
"end": v.get("end")} for v in variants]
return pd.DataFrame(rows)
df = scan_region("chr11", 5226000, 5227000)
print(df.head(10).to_string(index=False))
```
### Query 3: Full Evidence — Parse the `@graph` Rows
Each `@graph[i]` row is one experimental piece of evidence overlapping the query. Fields: `method, target_label, biosample_ontology{term_name, organ_slims, classification}, dataset, file, value, chrom, start, end, strand, ancestry, disease_term_name`.
```python
import requests, pandas as pd
BASE = "https://regulomedb.org"
def evidence_rows(variant, genome="GRCh38"):
r = requests.get(
f"{BASE}/regulome-search/",
params={"regions": variant, "genome": genome, "format": "json"},|
Opentrons Protocol API v2 for OT-2/Flex: Python protocols for pipetting, serial dilutions, PCR, plate replication; control thermocycler, heater-shaker, magnetic, temperature modules. Use pylabrobot for multi-vendor.
Interactive visualization with Plotly. 40+ chart types (scatter, line, heatmap, 3D, geographic) with hover, zoom, pan. Two APIs: Plotly Express (DataFrame) and Graph Objects (fine control). For static publication figures use matplotlib; for statistical grammar use seaborn.
Statistical visualization on matplotlib + pandas. Distributions (histplot, kdeplot, violin, box), relational (scatter, line), categorical, regression, correlation heatmaps. Auto aggregation/CIs. Use plotly for interactive; matplotlib for low-level.
Best practices for single-cell RNA-seq cell type annotation including marker-based, reference-based, and automated classification approaches.
Bayesian modeling with PyMC 5: priors, likelihood, NUTS/ADVI sampling, diagnostics (R-hat, ESS), LOO/WAIC comparison, prediction. Hierarchical, logistic, GP variants; predictive checks.
Time-to-event modeling with scikit-survival: Cox PH (elastic net), Random Survival Forests, Boosting, SVMs for censored data. C-index, Brier, time-dependent AUC; Kaplan-Meier, Nelson-Aalen, competing risks. Pipeline/GridSearchCV compatible. Use statsmodels for frequentist, pymc for Bayesian, lifelines for parametric.
>-