Skill286 estrellas del repoactualizado 5d ago

gene-database

The gene-database skill provides programmatic access to NCBI Gene, a curated database covering over 1 million genes across hundreds of thousands of species. Use this skill to resolve gene symbols and aliases to official NCBI identifiers, retrieve RefSeq accessions and genomic coordinates, query functional annotations via Gene Ontology, and find orthologous genes across species through the NCBI E-utilities REST API.

Ver fuente Repositorio: SciAgent-Skills

Instalar en Claude Code

Copiar

git clone --depth 1 https://github.com/jaechang-hits/SciAgent-Skills /tmp/gene-database && cp -r /tmp/gene-database/skills/genomics-bioinformatics/databases/gene-database ~/.claude/skills/gene-database

Después abre una sesión nueva de Claude Code; el skill carga automáticamente.

Definición

SKILL.md

# NCBI Gene Database

## Overview

NCBI Gene is the authoritative curated database for gene-centric information, covering 1M+ genes across hundreds of thousands of taxa. Each gene record includes the official symbol, aliases, full name, functional summary, genomic coordinates (GRCh38/GRCh37), RefSeq accessions, GO annotations, interaction partners, and links to related databases. Access is free via E-utilities REST API (no API key required, though recommended).

## When to Use

- Resolving gene aliases and synonyms to the current official HGNC/NCBI symbol
- Fetching the NCBI Gene ID (integer) for a gene symbol for downstream API calls (e.g., dbSNP, ClinVar, GEO)
- Retrieving curated gene summaries and function descriptions programmatically
- Pulling RefSeq mRNA (NM_) and protein (NP_) accessions associated with a gene
- Querying GO functional annotations (Biological Process, Molecular Function, Cellular Component)
- Finding orthologs across species via the NCBI Datasets v2 orthologs endpoint (legacy E-utilities `gene_gene_homolog` retired with HomoloGene in 2019)
- For expression profiles across conditions use `geo-database`; for variant annotations use `clinvar-database` or `ensembl-database`

## Prerequisites

- **Python packages**: `requests`, `xml.etree.ElementTree` (stdlib), `pandas` (optional)
- **Data requirements**: gene symbols, NCBI Gene IDs, or tax IDs
- **Environment**: internet connection; NCBI email required (set `email` parameter)
- **Rate limits**: 3 req/s unauthenticated; 10 req/s with free NCBI API key

```bash
pip install requests pandas
```

## Quick Start

```python
import requests

EMAIL = "your@email.com"
BASE = "https://eutils.ncbi.nlm.nih.gov/entrez/eutils"

def gene_search(query, retmax=5):
    r = requests.get(f"{BASE}/esearch.fcgi",
                     params={"db": "gene", "term": query,
                             "retmax": retmax, "retmode": "json", "email": EMAIL})
    r.raise_for_status()
    return r.json()["esearchresult"]["idlist"]

# Find human BRCA1 gene ID
ids = gene_search("BRCA1[sym] AND Homo sapiens[orgn]")
print(f"Gene IDs for BRCA1: {ids}")  # → ['672']
```

## Core API

### Query 1: Search by Symbol, Name, or Function

Use ESearch with field tags for precise queries.

```python
import requests

EMAIL = "your@email.com"
BASE = "https://eutils.ncbi.nlm.nih.gov/entrez/eutils"

# Exact symbol match for human gene
r = requests.get(f"{BASE}/esearch.fcgi",
                 params={"db": "gene", "email": EMAIL, "retmode": "json",
                         "term": "TP53[sym] AND Homo sapiens[orgn] AND alive[prop]"})
ids = r.json()["esearchresult"]["idlist"]
print(f"TP53 Gene ID: {ids}")  # → ['7157']
```

```python
# Search by function keyword
r = requests.get(f"{BASE}/esearch.fcgi",
                 params={"db": "gene", "email": EMAIL, "retmode": "json",
                         "term": "CRISPR[title] AND Homo sapiens[orgn]", "retmax": 5})
ids = r.json()["esearchresult"]["idlist"]
print(f"CRISPR-related gene IDs: {ids}")
```

### Query 2: Fetch Gene Summary (JSON/ESummary)

Retrieve key metadata fields for a list of Gene IDs.

```python
import requests

EMAIL = "your@email.com"
BASE = "https://eutils.ncbi.nlm.nih.gov/entrez/eutils"

def esummary_gene(gene_ids):
    r = requests.post(f"{BASE}/esummary.fcgi",
                      data={"db": "gene", "id": ",".join(gene_ids),
                            "retmode": "json", "email": EMAIL})
    r.raise_for_status()
    return r.json()["result"]

result = esummary_gene(["672", "675", "7157"])  # BRCA1, BRCA2, TP53

for uid in result.get("uids", []):
    g = result[uid]
    print(f"\n{g.get('name')} (ID {uid})")
    print(f"  Official symbol : {g.get('nomenclaturesymbol', g.get('name'))}")
    print(f"  Chr location    : {g.get('maplocation')}")
    print(f"  Summary (first 100): {g.get('summary', '')[:100]}...")
    print(f"  Aliases: {g.get('otheraliases', 'none')}")
```

### Query 3: Fetch Full Gene Record (XML)

Retrieve the complete gene record in XML for RefSeq accessions, GO terms, and interaction data.

```python
import requests
import xml.etree.ElementTree as ET

EMAIL = "your@email.com"
BASE = "https://eutils.ncbi.nlm.nih.gov/entrez/eutils"

def efetch_gene_xml(gene_id):
    r = requests.get(f"{BASE}/efetch.fcgi",
                     params={"db": "gene", "id": gene_id,
                             "rettype": "gene_table", "retmode": "text", "email": EMAIL})
    r.raise_for_status()
    return r.text

# Get gene table (tab-delimited overview)
table = efetch_gene_xml("672")
print(table[:500])
```

```python
# XML for RefSeq accession extraction
r = requests.get(f"{BASE}/efetch.fcgi",
                 params={"db": "gene", "id": "672",
                         "rettype": "xml", "retmode": "xml", "email": EMAIL})
root = ET.fromstring(r.text)

# Extract RefSeq mRNA accessions
for ref in root.iter("Gene-commentary"):
    acc = ref.find("Gene-commentary_accession")
    ver = ref.find("Gene-commentary_version")
    typ = ref.find("Gene-commentary_type")
    if acc is not None and acc.text and acc.text.startswith("NM_"):
        print(f"RefSeq mRNA: {acc.text}.{ver.text if ver is not None else ''}")
```

### Query 4: Batch Symbol-to-ID Mapping

Map a list of gene symbols to NCBI Gene IDs efficiently.

```python
import requests, time

EMAIL = "your@email.com"
BASE = "https://eutils.ncbi.nlm.nih.gov/entrez/eutils"

def symbols_to_ids(symbols, organism="Homo sapiens"):
    """Map gene symbols to NCBI Gene IDs. Returns dict {symbol: gene_id}."""
    mapping = {}
    for sym in symbols:
        r = requests.get(f"{BASE}/esearch.fcgi",
                         params={"db": "gene", "email": EMAIL, "retmode": "json",
                                 "term": f"{sym}[sym] AND {organism}[orgn] AND alive[prop]"})
        ids = r.json()["esearchresult"]["idlist"]
        mapping[sym] = ids[0] if ids else None
        time.sleep(0.1)
    return mapping

genes = ["EGFR", "KRAS", "BRAF", "PIK3CA", "PTEN"]
id_map = s

Del mismo repositorio

sciagent-skill-creatorSkill

opentrons-integrationSkill

Opentrons Protocol API v2 for OT-2/Flex: Python protocols for pipetting, serial dilutions, PCR, plate replication; control thermocycler, heater-shaker, magnetic, temperature modules. Use pylabrobot for multi-vendor.

plotly-interactive-visualizationSkill

Interactive visualization with Plotly. 40+ chart types (scatter, line, heatmap, 3D, geographic) with hover, zoom, pan. Two APIs: Plotly Express (DataFrame) and Graph Objects (fine control). For static publication figures use matplotlib; for statistical grammar use seaborn.

seaborn-statistical-visualizationSkill

Statistical visualization on matplotlib + pandas. Distributions (histplot, kdeplot, violin, box), relational (scatter, line), categorical, regression, correlation heatmaps. Auto aggregation/CIs. Use plotly for interactive; matplotlib for low-level.

single-cell-annotationSkill

Best practices for single-cell RNA-seq cell type annotation including marker-based, reference-based, and automated classification approaches.

pymc-bayesian-modelingSkill

Bayesian modeling with PyMC 5: priors, likelihood, NUTS/ADVI sampling, diagnostics (R-hat, ESS), LOO/WAIC comparison, prediction. Hierarchical, logistic, GP variants; predictive checks.

scikit-survival-analysisSkill

Time-to-event modeling with scikit-survival: Cox PH (elastic net), Random Survival Forests, Boosting, SVMs for censored data. C-index, Brier, time-dependent AUC; Kaplan-Meier, Nelson-Aalen, competing risks. Pipeline/GridSearchCV compatible. Use statsmodels for frequentist, pymc for Bayesian, lifelines for parametric.

statistical-analysisSkill