ucsc-genome-browser
Query UCSC Genome Browser REST API for DNA sequences, tracks, gene models, and conservation across 100+ assemblies. Retrieve sequence by region, list/fetch BED/bigWig tracks, chromosome sizes, RefSeq/GENCODE gene structures, PhyloP/PhastCons scores. Use for UCSC annotations; Ensembl REST API for Ensembl gene IDs and VEP variant annotation.
git clone --depth 1 https://github.com/jaechang-hits/SciAgent-Skills /tmp/ucsc-genome-browser && cp -r /tmp/ucsc-genome-browser/skills/genomics-bioinformatics/databases/ucsc-genome-browser ~/.claude/skills/ucsc-genome-browserSKILL.md
# UCSC Genome Browser
## Overview
The UCSC Genome Browser REST API at `https://api.genome.ucsc.edu/` provides programmatic access to genome sequences, annotation tracks, and hub data for 100+ assemblies including hg38, mm39, and dm6. The API is free, requires no authentication, and returns JSON. Use it with the `requests` library to fetch DNA sequences for genomic regions, retrieve track data (genes, repeats, conservation), list available tracks, and query chromosome sizes for genome-scale coordinate arithmetic.
## When to Use
- Fetching the reference DNA sequence for any genomic region (e.g., promoter, exon, CRISPR target) across human, mouse, or other assemblies
- Retrieving RefSeq or GENCODE gene structure (exon coordinates, CDS boundaries, strand) for a locus of interest
- Looking up PhyloP or PhastCons conservation scores to assess evolutionary constraint at a variant site
- Listing and querying any of UCSC's 1000+ annotation tracks (repeats, regulatory elements, conservation) for a region
- Getting chromosome sizes for a genome assembly to set up bedtools, pysam, or coverage pipelines
- Accessing public UCSC track hubs (e.g., ENCODE, Roadmap Epigenomics) without downloading data locally
- Use `ensembl-database` instead when you need Ensembl stable IDs, VEP variant annotation, or cross-species comparative genomics via the Ensembl REST API
- For bulk local queries across millions of regions, use `bedtools-genomic-intervals` with pre-downloaded UCSC annotation files
## Prerequisites
- **Python packages**: `requests`, `matplotlib` (for visualization)
- **Data requirements**: genomic coordinates (chrom, start, end in 0-based half-open BED format), genome assembly name (e.g., `hg38`, `mm39`)
- **Environment**: internet connection; no authentication required
- **Rate limits**: no official published limit; add 0.5s delays for batch requests (>100 queries)
```bash
pip install requests matplotlib
```
## Quick Start
```python
import requests
BASE = "https://api.genome.ucsc.edu"
def get_sequence(genome, chrom, start, end):
"""Fetch DNA sequence for a genomic region (0-based, half-open)."""
r = requests.get(f"{BASE}/getData/sequence",
params={"genome": genome, "chrom": chrom,
"start": start, "end": end})
r.raise_for_status()
return r.json()["dna"]
# Fetch 1 kb around the BRCA1 TSS on hg38
seq = get_sequence("hg38", "chr17", 43044294, 43045294)
print(f"Length: {len(seq)} bp")
print(f"Sequence: {seq[:60]}...")
# Length: 1000 bp
# Sequence: ATGATTGGTGGTTACATGCACAGTTGCTCTGGGAAGTTTCTTCTTCAGTTGAGAAAAGGT...
```
## Core API
### Query 1: Sequence Retrieval
Fetch the reference DNA sequence for any genomic region using the `getData/sequence` endpoint. Coordinates are 0-based, half-open (BED format).
```python
import requests
BASE = "https://api.genome.ucsc.edu"
def get_sequence(genome, chrom, start, end):
"""Return DNA sequence string for the given region."""
r = requests.get(f"{BASE}/getData/sequence",
params={"genome": genome, "chrom": chrom,
"start": start, "end": end})
r.raise_for_status()
data = r.json()
return data["dna"]
# TP53 exon 4 region (hg38)
seq = get_sequence("hg38", "chr17", 7676520, 7676620)
print(f"Region: chr17:7,676,520-7,676,620 ({len(seq)} bp)")
print(f"Sequence: {seq}")
```
```python
# Reverse-complement for minus-strand genes
def revcomp(seq):
comp = str.maketrans("ACGTacgt", "TGCAtgca")
return seq.translate(comp)[::-1]
# BRCA2 on minus strand (hg38)
seq_fwd = get_sequence("hg38", "chr13", 32315086, 32315186)
seq_rc = revcomp(seq_fwd)
print(f"Forward: {seq_fwd[:30]}...")
print(f"RevComp: {seq_rc[:30]}...")
```
### Query 2: Track Data Query
Retrieve annotation data (BED records) from any UCSC track for a genomic region.
```python
import requests
BASE = "https://api.genome.ucsc.edu"
def get_track_data(genome, track, chrom, start, end):
"""Fetch annotation records from a UCSC track for a region."""
r = requests.get(f"{BASE}/getData/track",
params={"genome": genome, "track": track,
"chrom": chrom, "start": start, "end": end})
r.raise_for_status()
data = r.json()
# Track data is under the key matching the track name
return data.get(track, data.get("data", []))
# Fetch RepeatMasker annotations in the MYC locus (hg38)
repeats = get_track_data("hg38", "rmsk", "chr8", 127_735_434, 127_742_951)
print(f"Repeat elements in MYC locus: {len(repeats)}")
for r in repeats[:3]:
print(f" {r.get('repName', r.get('name'))} | {r['chromStart']}-{r['chromEnd']}")
```
```python
# Fetch CpG islands near a promoter
cpg_islands = get_track_data("hg38", "cpgIslandExt", "chr17", 43_044_000, 43_050_000)
print(f"CpG islands found: {len(cpg_islands)}")
for island in cpg_islands:
print(f" {island['name']}: {island['chromStart']}-{island['chromEnd']}, "
f"obsExp={island.get('obsExp', 'n/a')}")
```
### Query 3: Track List
List all available annotation tracks for a genome assembly to discover what data is available.
```python
import requests
BASE = "https://api.genome.ucsc.edu"
def list_tracks(genome):
"""Return a dict of {track_name: track_metadata} for a genome assembly."""
r = requests.get(f"{BASE}/list/tracks", params={"genome": genome})
r.raise_for_status()
return r.json().get("tracks", {})
tracks = list_tracks("hg38")
print(f"Total tracks in hg38: {len(tracks)}")
# Find conservation-related tracks
conserv = {k: v for k, v in tracks.items() if "conserv" in k.lower() or "phylop" in k.lower()}
for name, meta in list(conserv.items())[:5]:
print(f" {name}: {meta.get('shortLabel', '')}")
```
### Query 4: Chromosome Sizes
Get the length of every chromosome (or scaffold) for a genome assembly.
```python
import requests
BASE = "https://api.genome.ucsc.edu"
def get_chrom_sizes(genome):
"""Return {chrom: size_in_bp} for a g|
Opentrons Protocol API v2 for OT-2/Flex: Python protocols for pipetting, serial dilutions, PCR, plate replication; control thermocycler, heater-shaker, magnetic, temperature modules. Use pylabrobot for multi-vendor.
Interactive visualization with Plotly. 40+ chart types (scatter, line, heatmap, 3D, geographic) with hover, zoom, pan. Two APIs: Plotly Express (DataFrame) and Graph Objects (fine control). For static publication figures use matplotlib; for statistical grammar use seaborn.
Statistical visualization on matplotlib + pandas. Distributions (histplot, kdeplot, violin, box), relational (scatter, line), categorical, regression, correlation heatmaps. Auto aggregation/CIs. Use plotly for interactive; matplotlib for low-level.
Best practices for single-cell RNA-seq cell type annotation including marker-based, reference-based, and automated classification approaches.
Bayesian modeling with PyMC 5: priors, likelihood, NUTS/ADVI sampling, diagnostics (R-hat, ESS), LOO/WAIC comparison, prediction. Hierarchical, logistic, GP variants; predictive checks.
Time-to-event modeling with scikit-survival: Cox PH (elastic net), Random Survival Forests, Boosting, SVMs for censored data. C-index, Brier, time-dependent AUC; Kaplan-Meier, Nelson-Aalen, competing risks. Pipeline/GridSearchCV compatible. Use statsmodels for frequentist, pymc for Bayesian, lifelines for parametric.
>-