Skip to main content
ClaudeWave
Skill843 estrellas del repoactualizado 4d ago

ensembl-database

Ensembl-database provides REST API access to the EMBL-EBI Ensembl genome database covering 250+ species, enabling queries for gene annotations, sequences, variants, orthologs, and comparative genomics data. Use this skill for gene lookups, sequence retrieval, variant effect prediction, ortholog identification, and genomic research integration.

Instalar en Claude Code
Copiar
git clone --depth 1 https://github.com/beita6969/ScienceClaw /tmp/ensembl-database && cp -r /tmp/ensembl-database/skills/ensembl-database ~/.claude/skills/ensembl-database
Después abre una sesión nueva de Claude Code; el skill carga automáticamente.

SKILL.md

# Ensembl Database

## Overview

Access and query the Ensembl genome database, a comprehensive resource for vertebrate genomic data maintained by EMBL-EBI. The database provides gene annotations, sequences, variants, regulatory information, and comparative genomics data for over 250 species. Current release is 115 (September 2025).

## When to Use This Skill

This skill should be used when:

- Querying gene information by symbol or Ensembl ID
- Retrieving DNA, transcript, or protein sequences
- Analyzing genetic variants using the Variant Effect Predictor (VEP)
- Finding orthologs and paralogs across species
- Accessing regulatory features and genomic annotations
- Converting coordinates between genome assemblies (e.g., GRCh37 to GRCh38)
- Performing comparative genomics analyses
- Integrating Ensembl data into genomic research pipelines

## Core Capabilities

### 1. Gene Information Retrieval

Query gene data by symbol, Ensembl ID, or external database identifiers.

**Common operations:**
- Look up gene information by symbol (e.g., "BRCA2", "TP53")
- Retrieve transcript and protein information
- Get gene coordinates and chromosomal locations
- Access cross-references to external databases (UniProt, RefSeq, etc.)

**Using the ensembl_rest package:**
```python
from ensembl_rest import EnsemblClient

client = EnsemblClient()

# Look up gene by symbol
gene_data = client.symbol_lookup(
    species='human',
    symbol='BRCA2'
)

# Get detailed gene information
gene_info = client.lookup_id(
    id='ENSG00000139618',  # BRCA2 Ensembl ID
    expand=True
)
```

**Direct REST API (no package):**
```python
import requests

server = "https://rest.ensembl.org"

# Symbol lookup
response = requests.get(
    f"{server}/lookup/symbol/homo_sapiens/BRCA2",
    headers={"Content-Type": "application/json"}
)
gene_data = response.json()
```

### 2. Sequence Retrieval

Fetch genomic, transcript, or protein sequences in various formats (JSON, FASTA, plain text).

**Operations:**
- Get DNA sequences for genes or genomic regions
- Retrieve transcript sequences (cDNA)
- Access protein sequences
- Extract sequences with flanking regions or modifications

**Example:**
```python
# Using ensembl_rest package
sequence = client.sequence_id(
    id='ENSG00000139618',  # Gene ID
    content_type='application/json'
)

# Get sequence for a genomic region
region_seq = client.sequence_region(
    species='human',
    region='7:140424943-140624564'  # chromosome:start-end
)
```

### 3. Variant Analysis

Query genetic variation data and predict variant consequences using the Variant Effect Predictor (VEP).

**Capabilities:**
- Look up variants by rsID or genomic coordinates
- Predict functional consequences of variants
- Access population frequency data
- Retrieve phenotype associations

**VEP example:**
```python
# Predict variant consequences
vep_result = client.vep_hgvs(
    species='human',
    hgvs_notation='ENST00000380152.7:c.803C>T'
)

# Query variant by rsID
variant = client.variation_id(
    species='human',
    id='rs699'
)
```

### 4. Comparative Genomics

Perform cross-species comparisons to identify orthologs, paralogs, and evolutionary relationships.

**Operations:**
- Find orthologs (same gene in different species)
- Identify paralogs (related genes in same species)
- Access gene trees showing evolutionary relationships
- Retrieve gene family information

**Example:**
```python
# Find orthologs for a human gene
orthologs = client.homology_ensemblgene(
    id='ENSG00000139618',  # Human BRCA2
    target_species='mouse'
)

# Get gene tree
gene_tree = client.genetree_member_symbol(
    species='human',
    symbol='BRCA2'
)
```

### 5. Genomic Region Analysis

Find all genomic features (genes, transcripts, regulatory elements) in a specific region.

**Use cases:**
- Identify all genes in a chromosomal region
- Find regulatory features (promoters, enhancers)
- Locate variants within a region
- Retrieve structural features

**Example:**
```python
# Find all features in a region
features = client.overlap_region(
    species='human',
    region='7:140424943-140624564',
    feature='gene'
)
```

### 6. Assembly Mapping

Convert coordinates between different genome assemblies (e.g., GRCh37 to GRCh38).

**Important:** Use `https://grch37.rest.ensembl.org` for GRCh37/hg19 queries and `https://rest.ensembl.org` for current assemblies.

**Example:**
```python
from ensembl_rest import AssemblyMapper

# Map coordinates from GRCh37 to GRCh38
mapper = AssemblyMapper(
    species='human',
    asm_from='GRCh37',
    asm_to='GRCh38'
)

mapped = mapper.map(chrom='7', start=140453136, end=140453136)
```

## API Best Practices

### Rate Limiting

The Ensembl REST API has rate limits. Follow these practices:

1. **Respect rate limits:** Maximum 15 requests per second for anonymous users
2. **Handle 429 responses:** When rate-limited, check the `Retry-After` header and wait
3. **Use batch endpoints:** When querying multiple items, use batch endpoints where available
4. **Cache results:** Store frequently accessed data to reduce API calls

### Error Handling

Always implement proper error handling:

```python
import requests
import time

def query_ensembl(endpoint, params=None, max_retries=3):
    server = "https://rest.ensembl.org"
    headers = {"Content-Type": "application/json"}

    for attempt in range(max_retries):
        response = requests.get(
            f"{server}{endpoint}",
            headers=headers,
            params=params
        )

        if response.status_code == 200:
            return response.json()
        elif response.status_code == 429:
            # Rate limited - wait and retry
            retry_after = int(response.headers.get('Retry-After', 1))
            time.sleep(retry_after)
        else:
            response.raise_for_status()

    raise Exception(f"Failed after {max_retries} attempts")
```

## Installation

### Python Package (Recommended)

```bash
uv pip install ensembl_rest
```

The `ensembl_r