Skill286 estrellas del repoactualizado 5d ago

cosmic-database

COSMIC is a large expert-curated database of somatic cancer mutations covering over 6.7 million coding variants, 40,000 cancer samples, and 19,000 genes across all cancer types. Use it to check somatic variant annotations in cancer genes, retrieve mutations within specific genes, access Cancer Gene Census classifications, look up mutational signatures, identify drug resistance variants, or build cancer driver gene lists. Free REST API v3.1 access requires registration at the Sanger Institute.

Ver fuente Repositorio: SciAgent-Skills

Instalar en Claude Code

Copiar

git clone --depth 1 https://github.com/jaechang-hits/SciAgent-Skills /tmp/cosmic-database && cp -r /tmp/cosmic-database/skills/genomics-bioinformatics/databases/cosmic-database ~/.claude/skills/cosmic-database

Después abre una sesión nueva de Claude Code; el skill carga automáticamente.

Definición

SKILL.md

# COSMIC Somatic Cancer Mutations Database

## Overview

COSMIC (Catalogue Of Somatic Mutations In Cancer) is the world's largest expert-curated database of somatic mutations in cancer, covering 6.7M+ coding mutations, 40,000+ cancer samples, 19,000+ genes across all cancer types. It includes the Cancer Gene Census (critical cancer genes), mutational signatures (SBS, DBS, ID), drug resistance variants, copy number data, gene expression, and methylation. The REST API v3.1 enables programmatic queries; most features are freely accessible after registration.

## When to Use

- Checking whether a specific somatic variant in a cancer gene is annotated in COSMIC (frequency, cancer type distribution)
- Retrieving all somatic mutations in a gene of interest across COSMIC cancer samples
- Accessing COSMIC Cancer Gene Census classifications (Tier 1/2, role: oncogene/TSG/fusion)
- Looking up mutational signature attributions for samples or cancer types
- Identifying drug resistance variants (pharmacogenomic data) from COSMIC drug resistance database
- Building cancer driver gene lists for bioinformatic pipelines
- For germline/inherited variants use `clinvar-database`; for drug-target associations use `opentargets-database`

## Prerequisites

- **Python packages**: `requests`, `pandas`
- **Data requirements**: gene symbols (HGNC), COSMIC mutation IDs (COSM), sample IDs, or genomic coordinates
- **Environment**: internet connection; free account registration at https://cancer.sanger.ac.uk/cosmic/register
- **Rate limits**: authenticated requests only; 10 requests/second max; API key required

```bash
pip install requests pandas
# Register at https://cancer.sanger.ac.uk/cosmic/register to obtain API credentials
```

## Quick Start

```python
import requests
import base64

# COSMIC API requires base64-encoded email:password authentication
EMAIL = "your_registered@email.com"
PASSWORD = "your_password"
token = base64.b64encode(f"{EMAIL}:{PASSWORD}".encode()).decode()

BASE = "https://cancer.sanger.ac.uk/cosmic/api"
HEADERS = {"Authorization": f"Basic {token}"}

# Get mutations for KRAS gene
r = requests.get(f"{BASE}/mutations",
                 headers=HEADERS,
                 params={"gene_name": "KRAS", "limit": 5})
r.raise_for_status()
data = r.json()
print(f"Total KRAS mutations: {data['meta']['total']}")
for m in data["data"][:3]:
    print(f"  {m['mutation_id']:15s} AA: {m.get('mutation_aa')} | Cancer: {m.get('primary_site')}")
```

## Core API

### Query 1: Gene Mutations Search

Retrieve all COSMIC somatic mutations for a gene, with cancer type and amino acid change.

```python
import requests, base64, pandas as pd

EMAIL = "your@email.com"
PASSWORD = "your_password"
token = base64.b64encode(f"{EMAIL}:{PASSWORD}".encode()).decode()

BASE = "https://cancer.sanger.ac.uk/cosmic/api"
HEADERS = {"Authorization": f"Basic {token}"}

def get_gene_mutations(gene, limit=100, cancer_site=None):
    params = {"gene_name": gene, "limit": limit}
    if cancer_site:
        params["primary_site"] = cancer_site
    r = requests.get(f"{BASE}/mutations", headers=HEADERS, params=params)
    r.raise_for_status()
    return r.json()

data = get_gene_mutations("TP53", limit=20)
print(f"Total TP53 mutations in COSMIC: {data['meta']['total']}")

rows = []
for m in data["data"][:10]:
    rows.append({
        "mutation_id": m.get("mutation_id"),
        "mutation_aa": m.get("mutation_aa"),
        "mutation_cds": m.get("mutation_cds"),
        "primary_site": m.get("primary_site"),
        "histology": m.get("primary_histology"),
        "count": m.get("count"),
    })
df = pd.DataFrame(rows)
print(df.head())
```

```python
# Filter by cancer site
data_lung = get_gene_mutations("TP53", cancer_site="lung", limit=20)
print(f"\nTP53 mutations in lung cancer: {data_lung['meta']['total']}")
```

### Query 2: Cancer Gene Census

Retrieve the COSMIC Cancer Gene Census — classified cancer driver genes.

```python
import requests, base64, pandas as pd

EMAIL = "your@email.com"
PASSWORD = "your_password"
token = base64.b64encode(f"{EMAIL}:{PASSWORD}".encode()).decode()

BASE = "https://cancer.sanger.ac.uk/cosmic/api"
HEADERS = {"Authorization": f"Basic {token}"}

r = requests.get(f"{BASE}/genes", headers=HEADERS, params={"limit": 100})
r.raise_for_status()
data = r.json()
print(f"Total genes in COSMIC: {data['meta']['total']}")

# Get Cancer Gene Census genes
r_cgc = requests.get(f"{BASE}/genes",
                     headers=HEADERS,
                     params={"cgc_tier": "1", "limit": 50})
cgc_data = r_cgc.json()
print(f"\nCGC Tier 1 genes: {cgc_data['meta']['total']}")

rows = []
for g in cgc_data["data"][:15]:
    rows.append({
        "gene": g.get("gene_name"),
        "tier": g.get("cgc_tier"),
        "role": g.get("role_in_cancer"),
        "mutation_types": g.get("mutation_types"),
        "tumour_types": str(g.get("tumour_types_somatic", []))[:80],
    })
df = pd.DataFrame(rows)
print(df.to_string(index=False))
```

### Query 3: Specific Mutation Lookup

Retrieve details for a known COSMIC mutation ID (COSM…).

```python
import requests, base64

EMAIL = "your@email.com"
PASSWORD = "your_password"
token = base64.b64encode(f"{EMAIL}:{PASSWORD}".encode()).decode()

BASE = "https://cancer.sanger.ac.uk/cosmic/api"
HEADERS = {"Authorization": f"Basic {token}"}

# KRAS G12D mutation
mutation_id = "COSM521"
r = requests.get(f"{BASE}/mutations/{mutation_id}", headers=HEADERS)
r.raise_for_status()
m = r.json()

print(f"Mutation ID : {m.get('mutation_id')}")
print(f"Gene        : {m.get('gene_name')}")
print(f"AA change   : {m.get('mutation_aa')}")
print(f"CDS change  : {m.get('mutation_cds')}")
print(f"Substitution: {m.get('mutation_description')}")
print(f"Count       : {m.get('count')} samples")
print(f"Cancer types: {str(m.get('cancer_types', []))[:100]}")
```

### Query 4: Sample-Level Mutation Data

Retrieve all somatic mutations for a specific cancer sample.

```python
import requests, base64, pandas as pd

EMAIL = "you

Del mismo repositorio

sciagent-skill-creatorSkill

opentrons-integrationSkill

Opentrons Protocol API v2 for OT-2/Flex: Python protocols for pipetting, serial dilutions, PCR, plate replication; control thermocycler, heater-shaker, magnetic, temperature modules. Use pylabrobot for multi-vendor.

plotly-interactive-visualizationSkill

Interactive visualization with Plotly. 40+ chart types (scatter, line, heatmap, 3D, geographic) with hover, zoom, pan. Two APIs: Plotly Express (DataFrame) and Graph Objects (fine control). For static publication figures use matplotlib; for statistical grammar use seaborn.

seaborn-statistical-visualizationSkill

Statistical visualization on matplotlib + pandas. Distributions (histplot, kdeplot, violin, box), relational (scatter, line), categorical, regression, correlation heatmaps. Auto aggregation/CIs. Use plotly for interactive; matplotlib for low-level.

single-cell-annotationSkill

Best practices for single-cell RNA-seq cell type annotation including marker-based, reference-based, and automated classification approaches.

pymc-bayesian-modelingSkill

Bayesian modeling with PyMC 5: priors, likelihood, NUTS/ADVI sampling, diagnostics (R-hat, ESS), LOO/WAIC comparison, prediction. Hierarchical, logistic, GP variants; predictive checks.

scikit-survival-analysisSkill

Time-to-event modeling with scikit-survival: Cox PH (elastic net), Random Survival Forests, Boosting, SVMs for censored data. C-index, Brier, time-dependent AUC; Kaplan-Meier, Nelson-Aalen, competing risks. Pipeline/GridSearchCV compatible. Use statsmodels for frequentist, pymc for Bayesian, lifelines for parametric.

statistical-analysisSkill