Skip to main content
ClaudeWave
Skill1.4k repo starsupdated today

tooluniverse-cancer-genomics-tcga

This skill provides systematic analysis of The Cancer Genome Atlas (TCGA) and GDC genomics data through cohort construction, clinical metadata retrieval, somatic mutation profiling, copy number variation analysis, and survival analysis. Use it for cancer-type-specific studies such as determining TP53 mutation frequency in breast cancer, conducting Kaplan-Meier survival analysis for lung adenocarcinoma patients, identifying KRAS mutations across TCGA projects, and interpreting variants with OncoKB. Always define the cancer type and cohort before querying mutations, as mutation prevalence varies significantly across cancer types.

Install in Claude Code
Copy
git clone --depth 1 https://github.com/mims-harvard/ToolUniverse /tmp/tooluniverse-cancer-genomics-tcga && cp -r /tmp/tooluniverse-cancer-genomics-tcga/plugin/skills/tooluniverse-cancer-genomics-tcga ~/.claude/skills/tooluniverse-cancer-genomics-tcga
Then start a new Claude Code session; the skill loads automatically.

SKILL.md

# Cancer Genomics / TCGA Analysis

**TCGA analysis starts with: what cancer type? what data type?** Build your cohort FIRST (GDC filters), then analyze. Don't query mutations without defining the cohort — pan-cancer counts from `GDC_get_mutation_frequency` are uninformative without cancer-type context. A mutation frequency of 10% in one cancer type may be 0.5% in another; always specify `project_id`. Survival analysis (Kaplan-Meier) is hypothesis-generating in retrospective TCGA data — always report sample size and p-value, and note that TCGA cohorts are not treatment-stratified.

**LOOK UP DON'T GUESS**: never assume TCGA project IDs, NCIt codes, or gene coordinates — use `GDC_list_projects` to confirm project IDs and `Progenetix_list_filtering_terms` for NCIt codes.

Systematic TCGA/GDC analysis: define cohorts, retrieve clinical data, profile somatic
mutations, query copy number variations, run survival analysis, and interpret variants
with OncoKB.

## When to Use

- "What is the mutation frequency of TP53 in TCGA-BRCA?"
- "Get survival data for TCGA-LUAD patients"
- "Find clinical data for breast cancer cases in GDC"
- "Which TCGA projects have KRAS G12C mutations?"
- "Show CNV amplifications of EGFR in glioblastoma"
- "Annotate BRAF V600E for clinical significance in melanoma"

## NOT for (use other skills instead)

- Precision oncology treatment recommendations -> Use `tooluniverse-precision-oncology`
- Rare disease gene discovery -> Use `tooluniverse-rare-disease-genomics`
- GWAS variant interpretation -> Use `tooluniverse-gwas-snp-interpretation`

---

## Workflow Overview

```
Input (cancer type / gene / TCGA project ID)
  |
  v
Phase 1: Study Selection  -- GDC_list_projects, GDC_search_cases
  |
  v
Phase 2: Clinical Data    -- GDC_get_clinical_data
  |
  v
Phase 3: Somatic Mutations -- GDC_get_ssm_by_gene, GDC_get_mutation_frequency
  |
  v
Phase 4: CNV Analysis     -- Progenetix_cnv_search, Progenetix_search_biosamples
  |
  v
Phase 5: Survival Analysis -- GDC_get_survival
  |
  v
Phase 6: Variant Interpretation -- OncoKB_annotate_variant
```

---

## Key Identifiers

| Data Type | Format | Example |
|-----------|--------|---------|
| GDC project | TCGA-{ABBREV} | TCGA-BRCA, TCGA-LUAD, TCGA-SKCM |
| GDC case | UUID | 3c6ef4c1-... |
| NCIt cancer code | NCIT:C###### | NCIT:C4017 (breast), NCIT:C3058 (GBM) |
| RefSeq chromosome | refseq:NC_###### | refseq:NC_000007.14 (chr7) |

### Common TCGA Project IDs

| Cancer | Project ID | NCIt Code |
|--------|-----------|-----------|
| Breast | TCGA-BRCA | NCIT:C4017 |
| Lung adenocarcinoma | TCGA-LUAD | NCIT:C3512 |
| Glioblastoma | TCGA-GBM | NCIT:C3058 |
| Melanoma | TCGA-SKCM | NCIT:C3510 |
| Colorectal | TCGA-COAD | NCIT:C4349 |
| Ovarian | TCGA-OV | NCIT:C4908 |
| Prostate | TCGA-PRAD | NCIT:C7378 |

---

## Phase 1: Study Selection

**GDC_list_projects**: No params required. Returns all GDC/TCGA projects with case counts.
- Use to browse available projects and map cancer types to project IDs.

**GDC_search_cases**: `project_id` (string, e.g., "TCGA-BRCA"), `size` (int, default 10), `offset` (int).
Returns case UUIDs and basic metadata.
- Use to confirm a project exists and retrieve case counts before deeper queries.

---

## Phase 2: Clinical Data

**GDC_get_clinical_data**: `project_id` (string), `primary_site` (string, e.g., "Breast"), `disease_type` (string), `vital_status` ("Alive" or "Dead"), `gender` ("female"/"male"), `size` (int, 1-100), `offset` (int).
Returns `{status, data: [{case_id, demographics: {gender, race, ethnicity, vital_status, age_at_index}, diagnoses: [{primary_diagnosis, tumor_stage, age_at_diagnosis, days_to_last_follow_up}], treatments: [{therapeutic_agents, treatment_type}]}]}`.
- Use `project_id` + optional filters to retrieve patient-level clinical attributes.
- `age_at_diagnosis` is in days; divide by 365.25 for years.
- Multiple diagnoses or treatments per case are possible.

```python
# Get clinical data for deceased BRCA patients
result = tu.tools.GDC_get_clinical_data(
    project_id="TCGA-BRCA", vital_status="Dead", size=50
)
```

---

## Phase 3: Somatic Mutations

**GDC_get_mutation_frequency**: `gene_symbol` (string REQUIRED, alias: `gene`). Returns pan-cancer SSM occurrence count.
- Returns TOTAL count across all TCGA; no per-project breakdown.
- For cancer-specific data, use `GDC_get_ssm_by_gene` with `project_id`.

**GDC_get_ssm_by_gene**: `gene_symbol` (string REQUIRED), `project_id` (string, optional), `size` (int, 1-100).
Returns `{status, data: [{ssm_id, mutation_type, genomic_dna_change, aa_change, consequence_type}]}`.
- `mutation_type`: "Single base substitution", "Insertion", "Deletion".
- `aa_change`: amino acid change notation (e.g., "Val600Glu").

```python
# TP53 mutations in lung adenocarcinoma
mutations = tu.tools.GDC_get_ssm_by_gene(
    gene_symbol="TP53", project_id="TCGA-LUAD", size=50
)
```

---

## Phase 4: CNV Analysis (Progenetix)

**Progenetix_search_biosamples**: `filters` (string REQUIRED, NCIt code e.g., "NCIT:C4017"), `limit` (int), `skip` (int).
Returns `{status, data: {biosamples: [{biosample_id, histological_diagnosis, pathological_stage, external_references}]}}`.
- Use to find samples with CNV profiles for a given cancer type.

**Progenetix_cnv_search**: `reference_name` (string REQUIRED, RefSeq accession), `start` (int REQUIRED, GRCh38 1-based), `end` (int REQUIRED), `variant_type` ("DUP"/"DEL"), `filters` (string, NCIt code), `limit` (int).
Returns biosamples with CNV in the specified genomic region.
- `variant_type="DUP"` for amplification, `"DEL"` for deletion.
- Use `filters` to restrict to a cancer type.

```python
# EGFR amplifications (chr7:55019017-55211628) in breast cancer
result = tu.tools.Progenetix_cnv_search(
    reference_name="refseq:NC_000007.14",
    start=55019017, end=55211628,
    variant_type="DUP", filters="NCIT:C4017", limit=10
)
```

**Progenetix_list_filtering_terms**: No params. Returns all available NCIt codes and labels.
- Use
setup-tooluniverseSkill

Install and configure ToolUniverse for any use case — MCP server (chat-based), CLI (command line with 9 subcommands), or Python SDK (Coding API with 3 calling patterns). Covers uv/uvx setup, MCP configuration for 12+ AI clients (Cursor, Claude Desktop, Windsurf, VS Code, Codex, Gemini CLI, Trae, Cline, etc.), full CLI reference (tu list/grep/find/info/run/test/status/build/serve), Coding API quickstart, agentic tools, code executor, API key walkthrough, skill installation, and upgrading. Use when user asks how to set up ToolUniverse, which access mode to use (MCP vs CLI vs SDK), configuring MCP servers, using the CLI, troubleshooting installation, upgrading, or mentions installing ToolUniverse or setting up scientific tools. Also triggers for "how do I use ToolUniverse", "what's the best way to access tools", "command line", "tu command", "coding API", "tu build".

tooluniverse-acmg-variant-classificationSkill

Systematic ACMG/AMP germline variant classification with all 28 criteria (PVS1, PS1-4, PM1-6, PP1-5, BA1, BS1-4, BP1-7) for clinical significance. Produces 5-tier verdict (Pathogenic / Likely Pathogenic / VUS / Likely Benign / Benign) with cited evidence per criterion. Use for variant interpretation, VUS resolution, and pathogenicity assessment. Combines ClinVar, gnomAD, computational predictors, and gene-mechanism context.

tooluniverse-admet-predictionSkill

Comprehensive ADMET (Absorption, Distribution, Metabolism, Excretion, Toxicity) profiling for drug candidates. Integrates ADMET-AI predictions, SwissADME drug-likeness, PubChemTox experimental toxicity, ChEMBL clinical data, Lipinski rule-of-five, and CYP interaction data. Use for drug-likeness assessment, BBB penetration, bioavailability, hepatotoxicity prediction, ADME/PK profiling, or screening compound libraries before lab testing.

tooluniverse-adverse-event-detectionSkill

Detect and analyze adverse drug event signals using FDA FAERS reports, drug labels, and disproportionality statistics (PRR, ROR, IC). Generates quantitative safety signal scores (0-100) with evidence grading. Use for post-market surveillance, pharmacovigilance, drug safety assessment, regulatory submissions, and detecting rare AE signals not visible in clinical trials.

tooluniverse-adverse-outcome-pathwaySkill

Map environmental and industrial chemicals to adverse outcome pathways (AOPs) — molecular initiating event to organ-level toxicity. Uses AOPWiki, GHS classification, IARC carcinogen status, and LD50 data. Use for environmental/industrial chemical risk assessment, regulatory-grade hazard characterization, and AOP stressor mapping. Distinct from drug-safety analysis (use tooluniverse-pharmacovigilance for drugs).

tooluniverse-aging-senescenceSkill

Aging biology, cellular senescence, and longevity research. Covers senescence markers (p16/CDKN2A, SASP, SA-beta-gal), aging hallmarks, senolytic drug discovery (dasatinib+quercetin, fisetin, navitoclax), epigenetic clocks, telomere biology, and longevity GWAS. Use for senescence-pathway analysis, age-related disease genetics, senolytic-target discovery, and centenarian-genetics queries. Distinguishes correlative vs causal evidence (knockout, intervention).

tooluniverse-antibody-engineeringSkill

Therapeutic antibody engineering and optimization, lead-to-clinical-candidate. Covers sequence humanization (germline alignment, framework retention), affinity maturation, developability (aggregation, stability, PTMs), structure modeling (AlphaFold/PDB CDR analysis), immunogenicity prediction, and manufacturing feasibility. Use for biologic-drug optimization, mAb design review, biosimilar engineering, and clinical-precedent comparison.

tooluniverse-binder-discoverySkill

Discover novel small-molecule binders for protein targets using structure-based and ligand-based screening. Covers druggability assessment, known-ligand mining (ChEMBL, BindingDB), similarity expansion, ADMET filtering, and synthesis feasibility. Use for hit identification, virtual screening, target-to-compounds workflows, and lead-finding before commit-to-medchem.