ensembl-database
Ensembl-database provides REST API access to the EMBL-EBI Ensembl genome database covering 250+ species, enabling queries for gene annotations, sequences, variants, orthologs, and comparative genomics data. Use this skill for gene lookups, sequence retrieval, variant effect prediction, ortholog identification, and genomic research integration.
git clone --depth 1 https://github.com/beita6969/ScienceClaw /tmp/ensembl-database && cp -r /tmp/ensembl-database/skills/ensembl-database ~/.claude/skills/ensembl-databaseSKILL.md
# Ensembl Database
## Overview
Access and query the Ensembl genome database, a comprehensive resource for vertebrate genomic data maintained by EMBL-EBI. The database provides gene annotations, sequences, variants, regulatory information, and comparative genomics data for over 250 species. Current release is 115 (September 2025).
## When to Use This Skill
This skill should be used when:
- Querying gene information by symbol or Ensembl ID
- Retrieving DNA, transcript, or protein sequences
- Analyzing genetic variants using the Variant Effect Predictor (VEP)
- Finding orthologs and paralogs across species
- Accessing regulatory features and genomic annotations
- Converting coordinates between genome assemblies (e.g., GRCh37 to GRCh38)
- Performing comparative genomics analyses
- Integrating Ensembl data into genomic research pipelines
## Core Capabilities
### 1. Gene Information Retrieval
Query gene data by symbol, Ensembl ID, or external database identifiers.
**Common operations:**
- Look up gene information by symbol (e.g., "BRCA2", "TP53")
- Retrieve transcript and protein information
- Get gene coordinates and chromosomal locations
- Access cross-references to external databases (UniProt, RefSeq, etc.)
**Using the ensembl_rest package:**
```python
from ensembl_rest import EnsemblClient
client = EnsemblClient()
# Look up gene by symbol
gene_data = client.symbol_lookup(
species='human',
symbol='BRCA2'
)
# Get detailed gene information
gene_info = client.lookup_id(
id='ENSG00000139618', # BRCA2 Ensembl ID
expand=True
)
```
**Direct REST API (no package):**
```python
import requests
server = "https://rest.ensembl.org"
# Symbol lookup
response = requests.get(
f"{server}/lookup/symbol/homo_sapiens/BRCA2",
headers={"Content-Type": "application/json"}
)
gene_data = response.json()
```
### 2. Sequence Retrieval
Fetch genomic, transcript, or protein sequences in various formats (JSON, FASTA, plain text).
**Operations:**
- Get DNA sequences for genes or genomic regions
- Retrieve transcript sequences (cDNA)
- Access protein sequences
- Extract sequences with flanking regions or modifications
**Example:**
```python
# Using ensembl_rest package
sequence = client.sequence_id(
id='ENSG00000139618', # Gene ID
content_type='application/json'
)
# Get sequence for a genomic region
region_seq = client.sequence_region(
species='human',
region='7:140424943-140624564' # chromosome:start-end
)
```
### 3. Variant Analysis
Query genetic variation data and predict variant consequences using the Variant Effect Predictor (VEP).
**Capabilities:**
- Look up variants by rsID or genomic coordinates
- Predict functional consequences of variants
- Access population frequency data
- Retrieve phenotype associations
**VEP example:**
```python
# Predict variant consequences
vep_result = client.vep_hgvs(
species='human',
hgvs_notation='ENST00000380152.7:c.803C>T'
)
# Query variant by rsID
variant = client.variation_id(
species='human',
id='rs699'
)
```
### 4. Comparative Genomics
Perform cross-species comparisons to identify orthologs, paralogs, and evolutionary relationships.
**Operations:**
- Find orthologs (same gene in different species)
- Identify paralogs (related genes in same species)
- Access gene trees showing evolutionary relationships
- Retrieve gene family information
**Example:**
```python
# Find orthologs for a human gene
orthologs = client.homology_ensemblgene(
id='ENSG00000139618', # Human BRCA2
target_species='mouse'
)
# Get gene tree
gene_tree = client.genetree_member_symbol(
species='human',
symbol='BRCA2'
)
```
### 5. Genomic Region Analysis
Find all genomic features (genes, transcripts, regulatory elements) in a specific region.
**Use cases:**
- Identify all genes in a chromosomal region
- Find regulatory features (promoters, enhancers)
- Locate variants within a region
- Retrieve structural features
**Example:**
```python
# Find all features in a region
features = client.overlap_region(
species='human',
region='7:140424943-140624564',
feature='gene'
)
```
### 6. Assembly Mapping
Convert coordinates between different genome assemblies (e.g., GRCh37 to GRCh38).
**Important:** Use `https://grch37.rest.ensembl.org` for GRCh37/hg19 queries and `https://rest.ensembl.org` for current assemblies.
**Example:**
```python
from ensembl_rest import AssemblyMapper
# Map coordinates from GRCh37 to GRCh38
mapper = AssemblyMapper(
species='human',
asm_from='GRCh37',
asm_to='GRCh38'
)
mapped = mapper.map(chrom='7', start=140453136, end=140453136)
```
## API Best Practices
### Rate Limiting
The Ensembl REST API has rate limits. Follow these practices:
1. **Respect rate limits:** Maximum 15 requests per second for anonymous users
2. **Handle 429 responses:** When rate-limited, check the `Retry-After` header and wait
3. **Use batch endpoints:** When querying multiple items, use batch endpoints where available
4. **Cache results:** Store frequently accessed data to reduce API calls
### Error Handling
Always implement proper error handling:
```python
import requests
import time
def query_ensembl(endpoint, params=None, max_retries=3):
server = "https://rest.ensembl.org"
headers = {"Content-Type": "application/json"}
for attempt in range(max_retries):
response = requests.get(
f"{server}{endpoint}",
headers=headers,
params=params
)
if response.status_code == 200:
return response.json()
elif response.status_code == 429:
# Rate limited - wait and retry
retry_after = int(response.headers.get('Retry-After', 1))
time.sleep(retry_after)
else:
response.raise_for_status()
raise Exception(f"Failed after {max_retries} attempts")
```
## Installation
### Python Package (Recommended)
```bash
uv pip install ensembl_rest
```
The `ensembl_rRoute plain-language requests for Pi, Claude Code, Codex, OpenCode, Gemini CLI, or ACP harness work into either OpenClaw ACP runtime sessions or direct acpx-driven sessions ("telephone game" flow). For coding-agent thread requests, read this skill first, then use only `sessions_spawn` for thread creation.
Use the diffs tool to produce real, shareable diffs (viewer URL, file artifact, or both) instead of manual edit summaries.
|
|
|
|
OpenProse VM skill pack. Activate on any `prose` command, .prose files, or OpenProse mentions; orchestrates multi-agent workflows.