Skip to main content
ClaudeWave
Skill199 estrellas del repoactualizado 16d ago

clinvar-database

Query NCBI ClinVar via E-utilities for variant clinical significance, pathogenicity, disease associations. Search by gene/rsID/condition/review status; returns ClinSig, submitter data, conditions, HGVS. For GWAS use gwas-database; for variant consequence prediction use Ensembl VEP.

Instalar en Claude Code
Copiar
git clone --depth 1 https://github.com/jaechang-hits/SciAgent-Skills /tmp/clinvar-database && cp -r /tmp/clinvar-database/skills/genomics-bioinformatics/databases/clinvar-database ~/.claude/skills/clinvar-database
Después abre una sesión nueva de Claude Code; el skill carga automáticamente.

SKILL.md

# ClinVar Clinical Variants Database

## Overview

ClinVar is NCBI's public archive of interpretations of variants submitted by clinical laboratories, researchers, and expert panels. It contains 2M+ variants with clinical significance classifications (Pathogenic, Likely Pathogenic, VUS, Likely Benign, Benign) for over 6,000 conditions. Access is free and requires no authentication via NCBI E-utilities.

## When to Use

- Checking whether a specific variant (rsID, HGVS, or genomic position) has a clinical significance classification
- Retrieving all pathogenic/likely-pathogenic variants in a gene of interest
- Identifying conflicting interpretations between submitting laboratories
- Pulling condition/phenotype associations for a variant (MIM, MeSH, HPO terms)
- Building variant filtering pipelines that prioritize clinically actionable variants
- For somatic cancer variants, also check `cosmic-database`; for GWAS associations use `gwas-database`

## Prerequisites

- **Python packages**: `requests`, `xml.etree.ElementTree` (stdlib)
- **Data requirements**: gene symbols, rsIDs, HGVS strings, or ClinVar Variation IDs
- **Environment**: internet connection; NCBI Entrez email required (set `email` parameter)
- **Rate limits**: 3 requests/second unauthenticated; 10/second with API key (free at https://www.ncbi.nlm.nih.gov/account/)

```bash
pip install requests
# No additional packages required; xml.etree is part of Python stdlib
```

## Quick Start

```python
import requests

EMAIL = "your@email.com"  # required by NCBI policy

def clinvar_search(query, retmax=10):
    """Search ClinVar and return a list of ClinVar Variation IDs."""
    r = requests.get(
        "https://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi",
        params={"db": "clinvar", "term": query, "retmax": retmax,
                "retmode": "json", "email": EMAIL}
    )
    r.raise_for_status()
    return r.json()["esearchresult"]["idlist"]

# Find pathogenic BRCA1 variants
ids = clinvar_search("BRCA1[gene] AND pathogenic[clinsig]", retmax=5)
print(f"Found variation IDs: {ids}")
```

## Core API

### Query 1: Search Variants by Gene and Clinical Significance

Use ESearch to find ClinVar Variation IDs matching a structured query.

```python
import requests

EMAIL = "your@email.com"
BASE = "https://eutils.ncbi.nlm.nih.gov/entrez/eutils"

def esearch(query, retmax=200):
    r = requests.get(f"{BASE}/esearch.fcgi",
                     params={"db": "clinvar", "term": query,
                             "retmax": retmax, "retmode": "json", "email": EMAIL})
    r.raise_for_status()
    result = r.json()["esearchresult"]
    return result["idlist"], int(result["count"])

# Gene-specific pathogenic variants
ids, total = esearch("BRCA2[gene] AND (pathogenic[clinsig] OR likely pathogenic[clinsig])")
print(f"Pathogenic/LP BRCA2 variants: {total} total, retrieved {len(ids)}")
print(f"First 5 IDs: {ids[:5]}")
```

```python
# By rsID
ids, _ = esearch("rs80357906[rs]")
print(f"Variant IDs for rs80357906: {ids}")

# By condition name
ids, total = esearch("breast cancer[dis] AND pathogenic[clinsig]")
print(f"Pathogenic variants for breast cancer: {total}")
```

### Query 2: Fetch Variant Summary Records

Retrieve structured summary data (JSON) for a list of Variation IDs.

```python
import requests, json

EMAIL = "your@email.com"
BASE = "https://eutils.ncbi.nlm.nih.gov/entrez/eutils"

def esummary(ids):
    """Fetch ESummary records for a list of ClinVar variation IDs."""
    r = requests.post(f"{BASE}/esummary.fcgi",
                      data={"db": "clinvar", "id": ",".join(ids),
                            "retmode": "json", "email": EMAIL})
    r.raise_for_status()
    return r.json()["result"]

ids, _ = esearch_func = lambda q: requests.get(
    f"{BASE}/esearch.fcgi",
    params={"db": "clinvar", "term": q, "retmax": 5, "retmode": "json", "email": EMAIL}
).json()["esearchresult"]["idlist"]

# Manual example with known IDs
sample_ids = ["12375", "17684", "54270"]
result = esummary(sample_ids)

for vid in result.get("uids", []):
    rec = result[vid]
    # ClinVar 2024 schema: clinical_significance was replaced by germline_classification
    # (also: clinical_impact_classification, oncogenicity_classification — same shape, often empty)
    gc = rec.get("germline_classification", {})
    print(f"\nVariation {vid}: {rec.get('title')}")
    print(f"  ClinSig  : {gc.get('description')}")
    print(f"  Review   : {gc.get('review_status')}")
    print(f"  Gene     : {rec.get('genes', [{}])[0].get('symbol')}")
```

### Query 3: Fetch Full XML Records

Retrieve the complete variant record in XML for detailed submitter and condition data.

```python
import requests
import xml.etree.ElementTree as ET

EMAIL = "your@email.com"
BASE = "https://eutils.ncbi.nlm.nih.gov/entrez/eutils"

def efetch_xml(variation_ids):
    # ClinVar 2024 XML overhaul: "clinvarset" rettype returns an empty stub.
    # Use rettype="vcv" + is_variationid="true" to get the new <VariationArchive> records.
    r = requests.post(f"{BASE}/efetch.fcgi",
                      data={"db": "clinvar", "id": ",".join(variation_ids),
                            "rettype": "vcv", "is_variationid": "true",
                            "retmode": "xml", "email": EMAIL})
    r.raise_for_status()
    return ET.fromstring(r.text)

root = efetch_xml(["17677"])  # BRCA1 c.5266dupC (rs80357906)

# Aggregate (germline) classification — one per VariationArchive
for va in root.iter("VariationArchive"):
    name = va.get("VariationName")
    gc = va.find("./ClassifiedRecord/Classifications/GermlineClassification")
    desc = gc.find("Description") if gc is not None else None
    rstat = gc.find("ReviewStatus") if gc is not None else None
    print(f"{name}: {desc.text if desc is not None else 'n/a'} "
          f"({rstat.text if rstat is not None else 'n/a'})")

    # Per-submitter assertions
    for ca in va.iter("ClinicalAssertion"):
        acc = ca.find("ClinVarAccession")
        cls = c
sciagent-skill-creatorSkill

|

opentrons-integrationSkill

Opentrons Protocol API v2 for OT-2/Flex: Python protocols for pipetting, serial dilutions, PCR, plate replication; control thermocycler, heater-shaker, magnetic, temperature modules. Use pylabrobot for multi-vendor.

plotly-interactive-visualizationSkill

Interactive visualization with Plotly. 40+ chart types (scatter, line, heatmap, 3D, geographic) with hover, zoom, pan. Two APIs: Plotly Express (DataFrame) and Graph Objects (fine control). For static publication figures use matplotlib; for statistical grammar use seaborn.

seaborn-statistical-visualizationSkill

Statistical visualization on matplotlib + pandas. Distributions (histplot, kdeplot, violin, box), relational (scatter, line), categorical, regression, correlation heatmaps. Auto aggregation/CIs. Use plotly for interactive; matplotlib for low-level.

single-cell-annotationSkill

Best practices for single-cell RNA-seq cell type annotation including marker-based, reference-based, and automated classification approaches.

pymc-bayesian-modelingSkill

Bayesian modeling with PyMC 5: priors, likelihood, NUTS/ADVI sampling, diagnostics (R-hat, ESS), LOO/WAIC comparison, prediction. Hierarchical, logistic, GP variants; predictive checks.

scikit-survival-analysisSkill

Time-to-event modeling with scikit-survival: Cox PH (elastic net), Random Survival Forests, Boosting, SVMs for censored data. C-index, Brier, time-dependent AUC; Kaplan-Meier, Nelson-Aalen, competing risks. Pipeline/GridSearchCV compatible. Use statsmodels for frequentist, pymc for Bayesian, lifelines for parametric.

statistical-analysisSkill

>-