cosmic-database
Query COSMIC for cancer somatic mutations, gene census, mutational signatures, drug resistance variants. REST API v3.1 supports gene/sample/variant queries; free registration. For germline use clinvar-database; for drug-target data use opentargets-database or chembl-database-bioactivity.
git clone --depth 1 https://github.com/jaechang-hits/SciAgent-Skills /tmp/cosmic-database && cp -r /tmp/cosmic-database/skills/genomics-bioinformatics/databases/cosmic-database ~/.claude/skills/cosmic-databaseSKILL.md
# COSMIC Somatic Cancer Mutations Database
## Overview
COSMIC (Catalogue Of Somatic Mutations In Cancer) is the world's largest expert-curated database of somatic mutations in cancer, covering 6.7M+ coding mutations, 40,000+ cancer samples, 19,000+ genes across all cancer types. It includes the Cancer Gene Census (critical cancer genes), mutational signatures (SBS, DBS, ID), drug resistance variants, copy number data, gene expression, and methylation. The REST API v3.1 enables programmatic queries; most features are freely accessible after registration.
## When to Use
- Checking whether a specific somatic variant in a cancer gene is annotated in COSMIC (frequency, cancer type distribution)
- Retrieving all somatic mutations in a gene of interest across COSMIC cancer samples
- Accessing COSMIC Cancer Gene Census classifications (Tier 1/2, role: oncogene/TSG/fusion)
- Looking up mutational signature attributions for samples or cancer types
- Identifying drug resistance variants (pharmacogenomic data) from COSMIC drug resistance database
- Building cancer driver gene lists for bioinformatic pipelines
- For germline/inherited variants use `clinvar-database`; for drug-target associations use `opentargets-database`
## Prerequisites
- **Python packages**: `requests`, `pandas`
- **Data requirements**: gene symbols (HGNC), COSMIC mutation IDs (COSM), sample IDs, or genomic coordinates
- **Environment**: internet connection; free account registration at https://cancer.sanger.ac.uk/cosmic/register
- **Rate limits**: authenticated requests only; 10 requests/second max; API key required
```bash
pip install requests pandas
# Register at https://cancer.sanger.ac.uk/cosmic/register to obtain API credentials
```
## Quick Start
```python
import requests
import base64
# COSMIC API requires base64-encoded email:password authentication
EMAIL = "your_registered@email.com"
PASSWORD = "your_password"
token = base64.b64encode(f"{EMAIL}:{PASSWORD}".encode()).decode()
BASE = "https://cancer.sanger.ac.uk/cosmic/api"
HEADERS = {"Authorization": f"Basic {token}"}
# Get mutations for KRAS gene
r = requests.get(f"{BASE}/mutations",
headers=HEADERS,
params={"gene_name": "KRAS", "limit": 5})
r.raise_for_status()
data = r.json()
print(f"Total KRAS mutations: {data['meta']['total']}")
for m in data["data"][:3]:
print(f" {m['mutation_id']:15s} AA: {m.get('mutation_aa')} | Cancer: {m.get('primary_site')}")
```
## Core API
### Query 1: Gene Mutations Search
Retrieve all COSMIC somatic mutations for a gene, with cancer type and amino acid change.
```python
import requests, base64, pandas as pd
EMAIL = "your@email.com"
PASSWORD = "your_password"
token = base64.b64encode(f"{EMAIL}:{PASSWORD}".encode()).decode()
BASE = "https://cancer.sanger.ac.uk/cosmic/api"
HEADERS = {"Authorization": f"Basic {token}"}
def get_gene_mutations(gene, limit=100, cancer_site=None):
params = {"gene_name": gene, "limit": limit}
if cancer_site:
params["primary_site"] = cancer_site
r = requests.get(f"{BASE}/mutations", headers=HEADERS, params=params)
r.raise_for_status()
return r.json()
data = get_gene_mutations("TP53", limit=20)
print(f"Total TP53 mutations in COSMIC: {data['meta']['total']}")
rows = []
for m in data["data"][:10]:
rows.append({
"mutation_id": m.get("mutation_id"),
"mutation_aa": m.get("mutation_aa"),
"mutation_cds": m.get("mutation_cds"),
"primary_site": m.get("primary_site"),
"histology": m.get("primary_histology"),
"count": m.get("count"),
})
df = pd.DataFrame(rows)
print(df.head())
```
```python
# Filter by cancer site
data_lung = get_gene_mutations("TP53", cancer_site="lung", limit=20)
print(f"\nTP53 mutations in lung cancer: {data_lung['meta']['total']}")
```
### Query 2: Cancer Gene Census
Retrieve the COSMIC Cancer Gene Census — classified cancer driver genes.
```python
import requests, base64, pandas as pd
EMAIL = "your@email.com"
PASSWORD = "your_password"
token = base64.b64encode(f"{EMAIL}:{PASSWORD}".encode()).decode()
BASE = "https://cancer.sanger.ac.uk/cosmic/api"
HEADERS = {"Authorization": f"Basic {token}"}
r = requests.get(f"{BASE}/genes", headers=HEADERS, params={"limit": 100})
r.raise_for_status()
data = r.json()
print(f"Total genes in COSMIC: {data['meta']['total']}")
# Get Cancer Gene Census genes
r_cgc = requests.get(f"{BASE}/genes",
headers=HEADERS,
params={"cgc_tier": "1", "limit": 50})
cgc_data = r_cgc.json()
print(f"\nCGC Tier 1 genes: {cgc_data['meta']['total']}")
rows = []
for g in cgc_data["data"][:15]:
rows.append({
"gene": g.get("gene_name"),
"tier": g.get("cgc_tier"),
"role": g.get("role_in_cancer"),
"mutation_types": g.get("mutation_types"),
"tumour_types": str(g.get("tumour_types_somatic", []))[:80],
})
df = pd.DataFrame(rows)
print(df.to_string(index=False))
```
### Query 3: Specific Mutation Lookup
Retrieve details for a known COSMIC mutation ID (COSM…).
```python
import requests, base64
EMAIL = "your@email.com"
PASSWORD = "your_password"
token = base64.b64encode(f"{EMAIL}:{PASSWORD}".encode()).decode()
BASE = "https://cancer.sanger.ac.uk/cosmic/api"
HEADERS = {"Authorization": f"Basic {token}"}
# KRAS G12D mutation
mutation_id = "COSM521"
r = requests.get(f"{BASE}/mutations/{mutation_id}", headers=HEADERS)
r.raise_for_status()
m = r.json()
print(f"Mutation ID : {m.get('mutation_id')}")
print(f"Gene : {m.get('gene_name')}")
print(f"AA change : {m.get('mutation_aa')}")
print(f"CDS change : {m.get('mutation_cds')}")
print(f"Substitution: {m.get('mutation_description')}")
print(f"Count : {m.get('count')} samples")
print(f"Cancer types: {str(m.get('cancer_types', []))[:100]}")
```
### Query 4: Sample-Level Mutation Data
Retrieve all somatic mutations for a specific cancer sample.
```python
import requests, base64, pandas as pd
EMAIL = "you|
Opentrons Protocol API v2 for OT-2/Flex: Python protocols for pipetting, serial dilutions, PCR, plate replication; control thermocycler, heater-shaker, magnetic, temperature modules. Use pylabrobot for multi-vendor.
Interactive visualization with Plotly. 40+ chart types (scatter, line, heatmap, 3D, geographic) with hover, zoom, pan. Two APIs: Plotly Express (DataFrame) and Graph Objects (fine control). For static publication figures use matplotlib; for statistical grammar use seaborn.
Statistical visualization on matplotlib + pandas. Distributions (histplot, kdeplot, violin, box), relational (scatter, line), categorical, regression, correlation heatmaps. Auto aggregation/CIs. Use plotly for interactive; matplotlib for low-level.
Best practices for single-cell RNA-seq cell type annotation including marker-based, reference-based, and automated classification approaches.
Bayesian modeling with PyMC 5: priors, likelihood, NUTS/ADVI sampling, diagnostics (R-hat, ESS), LOO/WAIC comparison, prediction. Hierarchical, logistic, GP variants; predictive checks.
Time-to-event modeling with scikit-survival: Cox PH (elastic net), Random Survival Forests, Boosting, SVMs for censored data. C-index, Brier, time-dependent AUC; Kaplan-Meier, Nelson-Aalen, competing risks. Pipeline/GridSearchCV compatible. Use statsmodels for frequentist, pymc for Bayesian, lifelines for parametric.
>-