ensembl-database
Ensembl REST API for gene/transcript/variant annotations in 300+ species. Gene info by symbol/ID, sequence, cross-refs (HGNC, RefSeq, UniProt), regulatory features. For bulk local use pyensembl; for pathways use kegg-database.
git clone --depth 1 https://github.com/jaechang-hits/SciAgent-Skills /tmp/ensembl-database && cp -r /tmp/ensembl-database/skills/genomics-bioinformatics/databases/ensembl-database ~/.claude/skills/ensembl-databaseSKILL.md
# Ensembl Genome Database
## Overview
Ensembl is a comprehensive genome annotation database covering 300+ vertebrate and non-vertebrate species. The Ensembl REST API provides programmatic access to gene models, transcript/protein sequences, variant annotations, cross-references, regulatory features, and comparative genomics without requiring any login or API key.
## When to Use
- Retrieving official gene and transcript annotations (stable IDs, biotype, genomic coordinates) for human or model organism genes
- Converting between gene identifier namespaces (HGNC symbol ↔ Ensembl ID ↔ RefSeq ↔ UniProt)
- Fetching genomic or cDNA/CDS/protein sequences for a gene or transcript
- Looking up variant consequences and functional impact (VEP) for a list of SNPs
- Querying regulatory features (promoters, enhancers, CTCF sites) in a genomic region
- Performing comparative genomics queries (orthologs, paralogs, gene trees) across species
- For local offline access to large genomic annotations, use `pyensembl` instead
- For pathway and metabolic annotations, use `kegg-database` or `reactome-database` instead
## Prerequisites
- **Python packages**: `requests`
- **Data requirements**: gene symbols, Ensembl stable IDs (ENSG…/ENST…/ENSP…), or genomic coordinates
- **Environment**: internet connection required; no API key needed
- **Rate limits**: max ~15 requests/second; use `expand=1` and batch endpoints to minimize calls
```bash
pip install requests
```
## Quick Start
```python
import requests
BASE = "https://rest.ensembl.org"
HEADERS = {"Content-Type": "application/json"}
def ensembl_get(endpoint, params=None):
r = requests.get(f"{BASE}{endpoint}", headers=HEADERS, params=params)
r.raise_for_status()
return r.json()
# Look up human BRCA1
gene = ensembl_get("/lookup/symbol/homo_sapiens/BRCA1", params={"expand": 1})
print(f"ID: {gene['id']}, Chr: {gene['seq_region_name']}:{gene['start']}-{gene['end']}")
print(f"Transcripts: {len(gene.get('Transcript', []))}")
```
## Core API
### Query 1: Gene Lookup by Symbol or Stable ID
Retrieve gene metadata from a gene symbol or Ensembl stable ID.
```python
import requests
BASE = "https://rest.ensembl.org"
HEADERS = {"Content-Type": "application/json"}
# By gene symbol
r = requests.get(
f"{BASE}/lookup/symbol/homo_sapiens/TP53",
headers=HEADERS,
params={"expand": 1}
)
gene = r.json()
print(f"Ensembl ID : {gene['id']}")
print(f"Location : {gene['seq_region_name']}:{gene['start']}-{gene['end']} ({gene['strand']})")
print(f"Biotype : {gene['biotype']}")
print(f"Transcripts: {len(gene.get('Transcript', []))}")
```
```python
# By stable ID (works for genes, transcripts, proteins)
r = requests.get(
f"{BASE}/lookup/id/ENSG00000141510",
headers=HEADERS,
params={"expand": 0}
)
obj = r.json()
print(f"Symbol: {obj.get('display_name')}, Species: {obj.get('species')}")
```
### Query 2: Batch Lookup
Retrieve information for multiple IDs in one call (POST endpoint).
```python
import requests, json
BASE = "https://rest.ensembl.org"
HEADERS = {"Content-Type": "application/json"}
# Batch lookup by symbols
symbols = ["BRCA1", "BRCA2", "TP53", "EGFR", "MYC"]
r = requests.post(
f"{BASE}/lookup/symbol/homo_sapiens",
headers=HEADERS,
data=json.dumps({"symbols": symbols})
)
results = r.json()
for sym, data in results.items():
if data:
print(f"{sym}: {data['id']} ({data['seq_region_name']}:{data['start']}-{data['end']})")
```
### Query 3: Sequence Retrieval
Fetch genomic, cDNA, CDS, or protein sequences.
```python
import requests
BASE = "https://rest.ensembl.org"
HEADERS = {"Content-Type": "text/plain"}
# Protein sequence for canonical transcript
r = requests.get(
f"{BASE}/sequence/id/ENST00000269305",
headers=HEADERS,
params={"type": "protein"}
)
seq = r.text
print(f"Protein sequence ({len(seq)} aa): {seq[:60]}...")
```
```python
# Genomic region sequence
HEADERS_JSON = {"Content-Type": "application/json"}
r = requests.get(
f"{BASE}/sequence/region/human/17:43044295..43125364",
headers=HEADERS_JSON,
params={"coord_system_version": "GRCh38"}
)
result = r.json()
print(f"Retrieved {len(result['seq'])} bp of genomic sequence")
```
### Query 4: Cross-References (ID Mapping)
Map Ensembl IDs to external database identifiers.
```python
import requests
BASE = "https://rest.ensembl.org"
HEADERS = {"Content-Type": "application/json"}
# All xrefs for a gene
r = requests.get(
f"{BASE}/xrefs/id/ENSG00000141510",
headers=HEADERS
)
xrefs = r.json()
# Group by database
from collections import defaultdict
by_db = defaultdict(list)
for x in xrefs:
by_db[x["dbname"]].append(x["primary_id"])
for db in ["HGNC", "RefSeq_gene_name", "Uniprot_gn", "MIM_gene"]:
if db in by_db:
print(f"{db}: {by_db[db]}")
```
### Query 5: Variant Consequence Annotation (VEP)
Predict functional consequences of variants via REST VEP endpoint.
```python
import requests, json
BASE = "https://rest.ensembl.org"
HEADERS = {"Content-Type": "application/json"}
# Annotate a list of hgvs notations
variants = ["17:g.43094692C>T", "13:g.32929387C>T"]
r = requests.post(
f"{BASE}/vep/human/hgvs",
headers=HEADERS,
data=json.dumps({"hgvs_notations": variants})
)
for v in r.json():
print(f"\nVariant: {v.get('input')}")
for tc in v.get("transcript_consequences", [])[:2]:
print(f" Gene: {tc.get('gene_symbol')}, Impact: {tc.get('impact')}, Consequence: {tc.get('consequence_terms')}")
```
```python
# Annotate by rsID
r = requests.get(
f"{BASE}/vep/human/id/rs699",
headers=HEADERS
)
v = r.json()[0]
print(f"rsID rs699 in gene: {v['transcript_consequences'][0]['gene_symbol']}")
print(f"Consequence: {v['transcript_consequences'][0]['consequence_terms']}")
```
### Query 6: Regulatory Features
Query regulatory build features in a genomic region.
```python
import requests
BASE = "https://rest.ensembl.org"
HEADERS = {"Content-Type": "application/json"}
# Regulatory fea|
Opentrons Protocol API v2 for OT-2/Flex: Python protocols for pipetting, serial dilutions, PCR, plate replication; control thermocycler, heater-shaker, magnetic, temperature modules. Use pylabrobot for multi-vendor.
Interactive visualization with Plotly. 40+ chart types (scatter, line, heatmap, 3D, geographic) with hover, zoom, pan. Two APIs: Plotly Express (DataFrame) and Graph Objects (fine control). For static publication figures use matplotlib; for statistical grammar use seaborn.
Statistical visualization on matplotlib + pandas. Distributions (histplot, kdeplot, violin, box), relational (scatter, line), categorical, regression, correlation heatmaps. Auto aggregation/CIs. Use plotly for interactive; matplotlib for low-level.
Best practices for single-cell RNA-seq cell type annotation including marker-based, reference-based, and automated classification approaches.
Bayesian modeling with PyMC 5: priors, likelihood, NUTS/ADVI sampling, diagnostics (R-hat, ESS), LOO/WAIC comparison, prediction. Hierarchical, logistic, GP variants; predictive checks.
Time-to-event modeling with scikit-survival: Cox PH (elastic net), Random Survival Forests, Boosting, SVMs for censored data. C-index, Brier, time-dependent AUC; Kaplan-Meier, Nelson-Aalen, competing risks. Pipeline/GridSearchCV compatible. Use statsmodels for frequentist, pymc for Bayesian, lifelines for parametric.
>-