Skip to main content
ClaudeWave
Skill199 repo starsupdated 16d ago

ucsc-genome-browser

Query UCSC Genome Browser REST API for DNA sequences, tracks, gene models, and conservation across 100+ assemblies. Retrieve sequence by region, list/fetch BED/bigWig tracks, chromosome sizes, RefSeq/GENCODE gene structures, PhyloP/PhastCons scores. Use for UCSC annotations; Ensembl REST API for Ensembl gene IDs and VEP variant annotation.

Install in Claude Code
Copy
git clone --depth 1 https://github.com/jaechang-hits/SciAgent-Skills /tmp/ucsc-genome-browser && cp -r /tmp/ucsc-genome-browser/skills/genomics-bioinformatics/databases/ucsc-genome-browser ~/.claude/skills/ucsc-genome-browser
Then start a new Claude Code session; the skill loads automatically.

SKILL.md

# UCSC Genome Browser

## Overview

The UCSC Genome Browser REST API at `https://api.genome.ucsc.edu/` provides programmatic access to genome sequences, annotation tracks, and hub data for 100+ assemblies including hg38, mm39, and dm6. The API is free, requires no authentication, and returns JSON. Use it with the `requests` library to fetch DNA sequences for genomic regions, retrieve track data (genes, repeats, conservation), list available tracks, and query chromosome sizes for genome-scale coordinate arithmetic.

## When to Use

- Fetching the reference DNA sequence for any genomic region (e.g., promoter, exon, CRISPR target) across human, mouse, or other assemblies
- Retrieving RefSeq or GENCODE gene structure (exon coordinates, CDS boundaries, strand) for a locus of interest
- Looking up PhyloP or PhastCons conservation scores to assess evolutionary constraint at a variant site
- Listing and querying any of UCSC's 1000+ annotation tracks (repeats, regulatory elements, conservation) for a region
- Getting chromosome sizes for a genome assembly to set up bedtools, pysam, or coverage pipelines
- Accessing public UCSC track hubs (e.g., ENCODE, Roadmap Epigenomics) without downloading data locally
- Use `ensembl-database` instead when you need Ensembl stable IDs, VEP variant annotation, or cross-species comparative genomics via the Ensembl REST API
- For bulk local queries across millions of regions, use `bedtools-genomic-intervals` with pre-downloaded UCSC annotation files

## Prerequisites

- **Python packages**: `requests`, `matplotlib` (for visualization)
- **Data requirements**: genomic coordinates (chrom, start, end in 0-based half-open BED format), genome assembly name (e.g., `hg38`, `mm39`)
- **Environment**: internet connection; no authentication required
- **Rate limits**: no official published limit; add 0.5s delays for batch requests (>100 queries)

```bash
pip install requests matplotlib
```

## Quick Start

```python
import requests

BASE = "https://api.genome.ucsc.edu"

def get_sequence(genome, chrom, start, end):
    """Fetch DNA sequence for a genomic region (0-based, half-open)."""
    r = requests.get(f"{BASE}/getData/sequence",
                     params={"genome": genome, "chrom": chrom,
                             "start": start, "end": end})
    r.raise_for_status()
    return r.json()["dna"]

# Fetch 1 kb around the BRCA1 TSS on hg38
seq = get_sequence("hg38", "chr17", 43044294, 43045294)
print(f"Length: {len(seq)} bp")
print(f"Sequence: {seq[:60]}...")
# Length: 1000 bp
# Sequence: ATGATTGGTGGTTACATGCACAGTTGCTCTGGGAAGTTTCTTCTTCAGTTGAGAAAAGGT...
```

## Core API

### Query 1: Sequence Retrieval

Fetch the reference DNA sequence for any genomic region using the `getData/sequence` endpoint. Coordinates are 0-based, half-open (BED format).

```python
import requests

BASE = "https://api.genome.ucsc.edu"

def get_sequence(genome, chrom, start, end):
    """Return DNA sequence string for the given region."""
    r = requests.get(f"{BASE}/getData/sequence",
                     params={"genome": genome, "chrom": chrom,
                             "start": start, "end": end})
    r.raise_for_status()
    data = r.json()
    return data["dna"]

# TP53 exon 4 region (hg38)
seq = get_sequence("hg38", "chr17", 7676520, 7676620)
print(f"Region: chr17:7,676,520-7,676,620 ({len(seq)} bp)")
print(f"Sequence: {seq}")
```

```python
# Reverse-complement for minus-strand genes
def revcomp(seq):
    comp = str.maketrans("ACGTacgt", "TGCAtgca")
    return seq.translate(comp)[::-1]

# BRCA2 on minus strand (hg38)
seq_fwd = get_sequence("hg38", "chr13", 32315086, 32315186)
seq_rc  = revcomp(seq_fwd)
print(f"Forward: {seq_fwd[:30]}...")
print(f"RevComp: {seq_rc[:30]}...")
```

### Query 2: Track Data Query

Retrieve annotation data (BED records) from any UCSC track for a genomic region.

```python
import requests

BASE = "https://api.genome.ucsc.edu"

def get_track_data(genome, track, chrom, start, end):
    """Fetch annotation records from a UCSC track for a region."""
    r = requests.get(f"{BASE}/getData/track",
                     params={"genome": genome, "track": track,
                             "chrom": chrom, "start": start, "end": end})
    r.raise_for_status()
    data = r.json()
    # Track data is under the key matching the track name
    return data.get(track, data.get("data", []))

# Fetch RepeatMasker annotations in the MYC locus (hg38)
repeats = get_track_data("hg38", "rmsk", "chr8", 127_735_434, 127_742_951)
print(f"Repeat elements in MYC locus: {len(repeats)}")
for r in repeats[:3]:
    print(f"  {r.get('repName', r.get('name'))} | {r['chromStart']}-{r['chromEnd']}")
```

```python
# Fetch CpG islands near a promoter
cpg_islands = get_track_data("hg38", "cpgIslandExt", "chr17", 43_044_000, 43_050_000)
print(f"CpG islands found: {len(cpg_islands)}")
for island in cpg_islands:
    print(f"  {island['name']}: {island['chromStart']}-{island['chromEnd']}, "
          f"obsExp={island.get('obsExp', 'n/a')}")
```

### Query 3: Track List

List all available annotation tracks for a genome assembly to discover what data is available.

```python
import requests

BASE = "https://api.genome.ucsc.edu"

def list_tracks(genome):
    """Return a dict of {track_name: track_metadata} for a genome assembly."""
    r = requests.get(f"{BASE}/list/tracks", params={"genome": genome})
    r.raise_for_status()
    return r.json().get("tracks", {})

tracks = list_tracks("hg38")
print(f"Total tracks in hg38: {len(tracks)}")

# Find conservation-related tracks
conserv = {k: v for k, v in tracks.items() if "conserv" in k.lower() or "phylop" in k.lower()}
for name, meta in list(conserv.items())[:5]:
    print(f"  {name}: {meta.get('shortLabel', '')}")
```

### Query 4: Chromosome Sizes

Get the length of every chromosome (or scaffold) for a genome assembly.

```python
import requests

BASE = "https://api.genome.ucsc.edu"

def get_chrom_sizes(genome):
    """Return {chrom: size_in_bp} for a g
sciagent-skill-creatorSkill

|

opentrons-integrationSkill

Opentrons Protocol API v2 for OT-2/Flex: Python protocols for pipetting, serial dilutions, PCR, plate replication; control thermocycler, heater-shaker, magnetic, temperature modules. Use pylabrobot for multi-vendor.

plotly-interactive-visualizationSkill

Interactive visualization with Plotly. 40+ chart types (scatter, line, heatmap, 3D, geographic) with hover, zoom, pan. Two APIs: Plotly Express (DataFrame) and Graph Objects (fine control). For static publication figures use matplotlib; for statistical grammar use seaborn.

seaborn-statistical-visualizationSkill

Statistical visualization on matplotlib + pandas. Distributions (histplot, kdeplot, violin, box), relational (scatter, line), categorical, regression, correlation heatmaps. Auto aggregation/CIs. Use plotly for interactive; matplotlib for low-level.

single-cell-annotationSkill

Best practices for single-cell RNA-seq cell type annotation including marker-based, reference-based, and automated classification approaches.

pymc-bayesian-modelingSkill

Bayesian modeling with PyMC 5: priors, likelihood, NUTS/ADVI sampling, diagnostics (R-hat, ESS), LOO/WAIC comparison, prediction. Hierarchical, logistic, GP variants; predictive checks.

scikit-survival-analysisSkill

Time-to-event modeling with scikit-survival: Cox PH (elastic net), Random Survival Forests, Boosting, SVMs for censored data. C-index, Brier, time-dependent AUC; Kaplan-Meier, Nelson-Aalen, competing risks. Pipeline/GridSearchCV compatible. Use statsmodels for frequentist, pymc for Bayesian, lifelines for parametric.

statistical-analysisSkill

>-