Skill284 repo starsupdated 4d ago

regulomedb-database

This Claude Code skill queries the RegulomeDB v2 REST API to score genetic variants for regulatory function on a scale of 1a (strongest evidence) to 7 (no regulatory function), returning overlapping evidence such as transcription factor binding sites, histone modifications, DNase peaks, motifs, eQTLs, and chromatin state. Use it to prioritize GWAS hits for regulatory investigation, annotate variant lists with regulatory potential scores, identify transcription factors binding near variants, or perform cis-regulatory discovery in genomic regions.

View source Repository: SciAgent-Skills

Install in Claude Code

Copy

git clone --depth 1 https://github.com/jaechang-hits/SciAgent-Skills /tmp/regulomedb-database && cp -r /tmp/regulomedb-database/skills/genomics-bioinformatics/databases/regulomedb-database ~/.claude/skills/regulomedb-database

Then start a new Claude Code session; the skill loads automatically.

Definition

SKILL.md

# RegulomeDB Database

## Overview

RegulomeDB integrates large-scale functional genomics data (ENCODE, Roadmap Epigenomics) to score genetic variants for regulatory potential. Each variant receives a ranking from 1a (highest regulatory confidence: eQTL + TF + DNase + motif + chromatin) to 7 (no known regulatory function). The v2 API is exposed as **GET** `https://regulomedb.org/regulome-search/`; the legacy POST `/regulome-search/`, POST `/regulome-summary/`, and GET `/regulome-datasets/` JSON endpoints are no longer functional (return `regulome-notfound` stubs or 500). Access is free and requires no authentication.

## When to Use

- Prioritizing GWAS hits for regulatory follow-up — identify which SNPs land in active regulatory elements
- Annotating a VCF or variant list with regulatory scores to filter to functionally relevant variants
- Identifying which transcription factors bind near a variant of interest (via the `@graph` evidence rows)
- Checking whether a non-coding variant overlaps a QTL and active chromatin simultaneously (`features.QTL`)
- Retrieving all annotated rsIDs in a genomic region for cis-regulatory analysis (region query with `nearby_snps`)
- Use `clinvar-database` instead when you need clinical pathogenicity classifications; RegulomeDB scores regulatory function, not germline disease association
- Use `gwas-database` instead when you want published GWAS associations with traits

## Prerequisites

- **Python packages**: `requests`, `pandas`, `matplotlib`
- **Data requirements**: rsIDs (e.g., `rs4946036`), genomic positions (`chr1:1000000`), or region coordinates (`chr1:1000000-2000000`)
- **Genome build**: GRCh38 (default) or GRCh37; specify in all requests
- **Rate limits**: No published rate limits; use `time.sleep(0.3)` between requests in batch workflows

```bash
pip install requests pandas matplotlib
```

## Quick Start

```python
import requests

BASE = "https://regulomedb.org"

def regulome_score(variant, genome="GRCh38"):
    """Score a single variant (rsID or chr:pos-pos) via the GET /regulome-search/ endpoint."""
    r = requests.get(
        f"{BASE}/regulome-search/",
        params={"regions": variant, "genome": genome, "format": "json"},
        timeout=30,
    )
    r.raise_for_status()
    d = r.json()
    rs = d.get("regulome_score", {})
    vs = d.get("variants", [])
    return {
        "query": variant,
        "ranking": rs.get("ranking"),           # 1a / 1b / ... / 7
        "probability": float(rs.get("probability", 0)),
        "rsids": vs[0].get("rsids") if vs else [],
        "chrom": vs[0].get("chrom") if vs else None,
        "pos": vs[0].get("start") if vs else None,
    }

print(regulome_score("rs4946036"))
# {'query': 'rs4946036', 'ranking': '7', 'probability': 0.18412,
#  'rsids': ['rs4946036'], 'chrom': 'chr6', 'pos': 114819799}
```

## Core API

### Query 1: Score a Single Variant (rsID or position)

The GET `/regulome-search/` endpoint accepts an rsID or coordinate as `regions=`. Returns a `regulome_score` block (probability, ranking, tissue-specific scores) plus `features` flags and the per-dataset `@graph` evidence rows.

```python
import requests

BASE = "https://regulomedb.org"

def score_variant(variant, genome="GRCh38"):
    """Return the regulome_score block and resolved coordinates."""
    r = requests.get(
        f"{BASE}/regulome-search/",
        params={"regions": variant, "genome": genome, "format": "json"},
        timeout=30,
    )
    r.raise_for_status()
    d = r.json()
    rs = d.get("regulome_score", {})
    vs = d.get("variants", [])
    feats = d.get("features", {})
    print(f"Variant   : {variant}")
    print(f"Resolved  : {vs[0]['chrom']}:{vs[0]['start']} ({', '.join(vs[0].get('rsids', []))})")
    print(f"Ranking   : {rs.get('ranking')}  prob={rs.get('probability')}")
    print(f"Features  : ChIP={feats['ChIP']} Chromatin_accessibility={feats['Chromatin_accessibility']} "
          f"QTL={feats['QTL']} Footprint={feats['Footprint']} PWM_matched={feats['PWM_matched']}")
    return d

# Strong-regulatory locus example
score_variant("chr11:5226739-5226740")
# Ranking: 1a (HBB beta-globin promoter, multi-evidence)
```

```python
# Score by chromosomal position alone
score_variant("chr17:7670000-7670001")  # TP53 region
```

### Query 2: Region Scan — List Annotated Variants in a Window

A range query returns up to `limit` resolved variants (`variants[]`) and all `@graph` evidence rows in the window, plus `nearby_snps` (rsIDs adjacent to the resolved hits).

```python
import requests, pandas as pd

BASE = "https://regulomedb.org"

def scan_region(chrom, start, end, genome="GRCh38", limit=200):
    """List variants in a region with their resolved positions and overlapping rsIDs."""
    r = requests.get(
        f"{BASE}/regulome-search/",
        params={"regions": f"{chrom}:{start}-{end}", "genome": genome,
                "format": "json", "limit": limit},
        timeout=60,
    )
    r.raise_for_status()
    d = r.json()
    variants = d.get("variants", [])
    print(f"Variants in {chrom}:{start}-{end}: {len(variants)} (total indexed = {d.get('total')})")
    rows = [{"rsids": ", ".join(v.get("rsids", [])),
             "chrom": v.get("chrom"),
             "start": v.get("start"),
             "end": v.get("end")} for v in variants]
    return pd.DataFrame(rows)

df = scan_region("chr11", 5226000, 5227000)
print(df.head(10).to_string(index=False))
```

### Query 3: Full Evidence — Parse the `@graph` Rows

Each `@graph[i]` row is one experimental piece of evidence overlapping the query. Fields: `method, target_label, biosample_ontology{term_name, organ_slims, classification}, dataset, file, value, chrom, start, end, strand, ancestry, disease_term_name`.

```python
import requests, pandas as pd

BASE = "https://regulomedb.org"

def evidence_rows(variant, genome="GRCh38"):
    r = requests.get(
        f"{BASE}/regulome-search/",
        params={"regions": variant, "genome": genome, "format": "json"},