Skip to main content
ClaudeWave
Skill199 estrellas del repoactualizado 16d ago

gene-database

NCBI Gene via E-utilities: curated records across 1M+ taxa. Official symbols, aliases, RefSeq IDs, summaries, coordinates, GO, interactions. Use for gene ID resolution and cross-species function queries. For sequences use Ensembl; for expression use geo-database.

Instalar en Claude Code
Copiar
git clone --depth 1 https://github.com/jaechang-hits/SciAgent-Skills /tmp/gene-database && cp -r /tmp/gene-database/skills/genomics-bioinformatics/databases/gene-database ~/.claude/skills/gene-database
Después abre una sesión nueva de Claude Code; el skill carga automáticamente.

SKILL.md

# NCBI Gene Database

## Overview

NCBI Gene is the authoritative curated database for gene-centric information, covering 1M+ genes across hundreds of thousands of taxa. Each gene record includes the official symbol, aliases, full name, functional summary, genomic coordinates (GRCh38/GRCh37), RefSeq accessions, GO annotations, interaction partners, and links to related databases. Access is free via E-utilities REST API (no API key required, though recommended).

## When to Use

- Resolving gene aliases and synonyms to the current official HGNC/NCBI symbol
- Fetching the NCBI Gene ID (integer) for a gene symbol for downstream API calls (e.g., dbSNP, ClinVar, GEO)
- Retrieving curated gene summaries and function descriptions programmatically
- Pulling RefSeq mRNA (NM_) and protein (NP_) accessions associated with a gene
- Querying GO functional annotations (Biological Process, Molecular Function, Cellular Component)
- Finding orthologs across species via the NCBI Datasets v2 orthologs endpoint (legacy E-utilities `gene_gene_homolog` retired with HomoloGene in 2019)
- For expression profiles across conditions use `geo-database`; for variant annotations use `clinvar-database` or `ensembl-database`

## Prerequisites

- **Python packages**: `requests`, `xml.etree.ElementTree` (stdlib), `pandas` (optional)
- **Data requirements**: gene symbols, NCBI Gene IDs, or tax IDs
- **Environment**: internet connection; NCBI email required (set `email` parameter)
- **Rate limits**: 3 req/s unauthenticated; 10 req/s with free NCBI API key

```bash
pip install requests pandas
```

## Quick Start

```python
import requests

EMAIL = "your@email.com"
BASE = "https://eutils.ncbi.nlm.nih.gov/entrez/eutils"

def gene_search(query, retmax=5):
    r = requests.get(f"{BASE}/esearch.fcgi",
                     params={"db": "gene", "term": query,
                             "retmax": retmax, "retmode": "json", "email": EMAIL})
    r.raise_for_status()
    return r.json()["esearchresult"]["idlist"]

# Find human BRCA1 gene ID
ids = gene_search("BRCA1[sym] AND Homo sapiens[orgn]")
print(f"Gene IDs for BRCA1: {ids}")  # → ['672']
```

## Core API

### Query 1: Search by Symbol, Name, or Function

Use ESearch with field tags for precise queries.

```python
import requests

EMAIL = "your@email.com"
BASE = "https://eutils.ncbi.nlm.nih.gov/entrez/eutils"

# Exact symbol match for human gene
r = requests.get(f"{BASE}/esearch.fcgi",
                 params={"db": "gene", "email": EMAIL, "retmode": "json",
                         "term": "TP53[sym] AND Homo sapiens[orgn] AND alive[prop]"})
ids = r.json()["esearchresult"]["idlist"]
print(f"TP53 Gene ID: {ids}")  # → ['7157']
```

```python
# Search by function keyword
r = requests.get(f"{BASE}/esearch.fcgi",
                 params={"db": "gene", "email": EMAIL, "retmode": "json",
                         "term": "CRISPR[title] AND Homo sapiens[orgn]", "retmax": 5})
ids = r.json()["esearchresult"]["idlist"]
print(f"CRISPR-related gene IDs: {ids}")
```

### Query 2: Fetch Gene Summary (JSON/ESummary)

Retrieve key metadata fields for a list of Gene IDs.

```python
import requests

EMAIL = "your@email.com"
BASE = "https://eutils.ncbi.nlm.nih.gov/entrez/eutils"

def esummary_gene(gene_ids):
    r = requests.post(f"{BASE}/esummary.fcgi",
                      data={"db": "gene", "id": ",".join(gene_ids),
                            "retmode": "json", "email": EMAIL})
    r.raise_for_status()
    return r.json()["result"]

result = esummary_gene(["672", "675", "7157"])  # BRCA1, BRCA2, TP53

for uid in result.get("uids", []):
    g = result[uid]
    print(f"\n{g.get('name')} (ID {uid})")
    print(f"  Official symbol : {g.get('nomenclaturesymbol', g.get('name'))}")
    print(f"  Chr location    : {g.get('maplocation')}")
    print(f"  Summary (first 100): {g.get('summary', '')[:100]}...")
    print(f"  Aliases: {g.get('otheraliases', 'none')}")
```

### Query 3: Fetch Full Gene Record (XML)

Retrieve the complete gene record in XML for RefSeq accessions, GO terms, and interaction data.

```python
import requests
import xml.etree.ElementTree as ET

EMAIL = "your@email.com"
BASE = "https://eutils.ncbi.nlm.nih.gov/entrez/eutils"

def efetch_gene_xml(gene_id):
    r = requests.get(f"{BASE}/efetch.fcgi",
                     params={"db": "gene", "id": gene_id,
                             "rettype": "gene_table", "retmode": "text", "email": EMAIL})
    r.raise_for_status()
    return r.text

# Get gene table (tab-delimited overview)
table = efetch_gene_xml("672")
print(table[:500])
```

```python
# XML for RefSeq accession extraction
r = requests.get(f"{BASE}/efetch.fcgi",
                 params={"db": "gene", "id": "672",
                         "rettype": "xml", "retmode": "xml", "email": EMAIL})
root = ET.fromstring(r.text)

# Extract RefSeq mRNA accessions
for ref in root.iter("Gene-commentary"):
    acc = ref.find("Gene-commentary_accession")
    ver = ref.find("Gene-commentary_version")
    typ = ref.find("Gene-commentary_type")
    if acc is not None and acc.text and acc.text.startswith("NM_"):
        print(f"RefSeq mRNA: {acc.text}.{ver.text if ver is not None else ''}")
```

### Query 4: Batch Symbol-to-ID Mapping

Map a list of gene symbols to NCBI Gene IDs efficiently.

```python
import requests, time

EMAIL = "your@email.com"
BASE = "https://eutils.ncbi.nlm.nih.gov/entrez/eutils"

def symbols_to_ids(symbols, organism="Homo sapiens"):
    """Map gene symbols to NCBI Gene IDs. Returns dict {symbol: gene_id}."""
    mapping = {}
    for sym in symbols:
        r = requests.get(f"{BASE}/esearch.fcgi",
                         params={"db": "gene", "email": EMAIL, "retmode": "json",
                                 "term": f"{sym}[sym] AND {organism}[orgn] AND alive[prop]"})
        ids = r.json()["esearchresult"]["idlist"]
        mapping[sym] = ids[0] if ids else None
        time.sleep(0.1)
    return mapping

genes = ["EGFR", "KRAS", "BRAF", "PIK3CA", "PTEN"]
id_map = s
sciagent-skill-creatorSkill

|

opentrons-integrationSkill

Opentrons Protocol API v2 for OT-2/Flex: Python protocols for pipetting, serial dilutions, PCR, plate replication; control thermocycler, heater-shaker, magnetic, temperature modules. Use pylabrobot for multi-vendor.

plotly-interactive-visualizationSkill

Interactive visualization with Plotly. 40+ chart types (scatter, line, heatmap, 3D, geographic) with hover, zoom, pan. Two APIs: Plotly Express (DataFrame) and Graph Objects (fine control). For static publication figures use matplotlib; for statistical grammar use seaborn.

seaborn-statistical-visualizationSkill

Statistical visualization on matplotlib + pandas. Distributions (histplot, kdeplot, violin, box), relational (scatter, line), categorical, regression, correlation heatmaps. Auto aggregation/CIs. Use plotly for interactive; matplotlib for low-level.

single-cell-annotationSkill

Best practices for single-cell RNA-seq cell type annotation including marker-based, reference-based, and automated classification approaches.

pymc-bayesian-modelingSkill

Bayesian modeling with PyMC 5: priors, likelihood, NUTS/ADVI sampling, diagnostics (R-hat, ESS), LOO/WAIC comparison, prediction. Hierarchical, logistic, GP variants; predictive checks.

scikit-survival-analysisSkill

Time-to-event modeling with scikit-survival: Cox PH (elastic net), Random Survival Forests, Boosting, SVMs for censored data. C-index, Brier, time-dependent AUC; Kaplan-Meier, Nelson-Aalen, competing risks. Pipeline/GridSearchCV compatible. Use statsmodels for frequentist, pymc for Bayesian, lifelines for parametric.

statistical-analysisSkill

>-