tooluniverse-cancer-classification
This Claude Code skill standardizes cancer nomenclature by translating free-text tumor descriptions into structured OncoTree codes with cross-references to UMLS and NCI vocabularies. Use it when researchers need to map clinical notes to standardized cancer classifications, annotate variants in OncoKB, build genomic cohorts in GDC, or navigate the cancer subtype hierarchy across tissue origins.
git clone --depth 1 https://github.com/mims-harvard/ToolUniverse /tmp/tooluniverse-cancer-classification && cp -r /tmp/tooluniverse-cancer-classification/plugin/skills/tooluniverse-cancer-classification ~/.claude/skills/tooluniverse-cancer-classificationSKILL.md
# Cancer Classification via OncoTree
Standardize cancer type nomenclature using the OncoTree ontology. Resolves free-text tumor
descriptions to structured codes with UMLS/NCI cross-references, enabling downstream use in
OncoKB variant annotation and GDC cohort selection.
## When to Use
Apply when researcher asks about:
- "What is the OncoTree code for [tumor description]?"
- "Find all subtypes of [cancer type]"
- "What cancers originate in [tissue]?"
- "I need the tumor type code for OncoKB annotation"
- "What is the TCGA/COSMIC code for [cancer]?"
- "List all CNS/Brain cancer subtypes"
- "What NCI code corresponds to glioblastoma?"
## Key Tools
| Tool | Purpose | Key Params |
|------|---------|-----------|
| `OncoTree_search` | Free-text search for cancer types | `query` (tumor name or description) |
| `OncoTree_get_type` | Full details for a known OncoTree code | `code` (e.g., "LUAD", "AML") |
| `OncoTree_list_tissues` | List all 32 tissue categories | (no params) |
| `OncoKB_annotate_variant` | Variant annotation using OncoTree code | `gene`, `variant`, `tumor_type` |
| `GDC_get_mutation_frequency` | Pan-cancer mutation frequency (TCGA) | `gene_symbol` |
## Workflow
### Phase 1: Cancer Type Discovery
Start with free-text search to find matching OncoTree codes:
```
OncoTree_search(query="breast cancer")
-> Returns list: code, name, main_type, tissue, parent, level, external_references
```
Key response fields:
- `code`: OncoTree code (e.g., "BRCA", "IBC") — use this in OncoKB calls
- `level`: hierarchy depth (1=tissue, 2=main type, 3-5=subtypes)
- `parent`: parent node code for navigating the hierarchy
- `external_references.UMLS`: UMLS CUI list
- `external_references.NCI`: NCI thesaurus code list
Search tips:
- Broad terms ("lung cancer") return many results; narrow by tissue or level
- Use tissue-specific terms ("invasive breast carcinoma") for precise matching
- Acronyms work: query="GBM" finds glioblastoma, query="AML" finds leukemia types
### Phase 2: Code Validation and Detail Retrieval
Once you have a candidate code, retrieve full details:
```
OncoTree_get_type(code="LUAD")
-> Returns: name, main_type, tissue, color, parent, level, history, external_references
```
Note: Not all codes are valid. "GBM" returns 404 — correct code is "GB" (Glioblastoma, IDH-Wildtype).
Always validate via `OncoTree_get_type` before using in downstream tools.
### Phase 3: Tissue-Level Exploration
When the user wants all cancers in a tissue category:
```
OncoTree_list_tissues()
-> Returns 32 tissue names: "Breast", "CNS/Brain", "Lung", "Myeloid", ...
OncoTree_search(query="CNS/Brain")
-> All cancer types with tissue="CNS/Brain"
```
### Phase 4: Downstream Use in Variant Annotation
Pass validated OncoTree code to OncoKB for cancer-type-specific therapeutic levels:
```
OncoKB_annotate_variant(gene="EGFR", variant="L858R", tumor_type="LUAD")
-> highestSensitiveLevel: "1" (FDA-approved therapy for this tumor+variant)
```
Without `tumor_type`, OncoKB returns pan-cancer levels which may be less specific.
## Tool Parameter Reference
| Tool | Required | Optional | Notes |
|------|---------|---------|-------|
| `OncoTree_search` | `query` | — | Free text; returns list sorted by relevance |
| `OncoTree_get_type` | `code` | — | Case-sensitive; "BRCA" not "brca". Returns 404 for invalid codes |
| `OncoTree_list_tissues` | — | — | No params; returns list of 32 tissue strings |
| `OncoKB_annotate_variant` | `gene`, `variant` | `tumor_type` | `tumor_type` is OncoTree code; omit for pan-cancer |
| `GDC_get_mutation_frequency` | `gene_symbol` | — | Pan-cancer TCGA only; no per-subtype breakdown |
## Common OncoTree Codes (verified working)
| Code | Name | Tissue |
|------|------|--------|
| `BRCA` | Invasive Breast Carcinoma | Breast |
| `LUAD` | Lung Adenocarcinoma | Lung |
| `LUSC` | Lung Squamous Cell Carcinoma | Lung |
| `MEL` | Melanoma | Skin |
| `CRC` | Colorectal Cancer | Bowel |
| `PAAD` | Pancreatic Adenocarcinoma | Pancreas |
| `GBM` | (invalid — use `GB`) | CNS/Brain |
| `GB` | Glioblastoma, IDH-Wildtype | CNS/Brain |
| `AML` | Acute Myeloid Leukemia | Myeloid |
| `PRAD` | Prostate Adenocarcinoma | Prostate |
## Common Patterns
```python
# Pattern: Resolve free-text to OncoTree code
results = OncoTree_search(query="pancreatic ductal adenocarcinoma")
# Pick result with lowest level number (most specific match)
code = results["data"][0]["code"] # e.g., "PAAD"
# Pattern: Get all subtypes within a main type
results = OncoTree_search(query="Glioma")
subtypes = [r for r in results["data"] if r["main_type"] == "Glioma"]
# Pattern: Validate code before OncoKB call
detail = OncoTree_get_type(code="GB")
if detail["status"] == "success":
OncoKB_annotate_variant(gene="IDH1", variant="R132H", tumor_type="GB")
```
## Tumor Classification Reasoning (CRITICAL)
**LOOK UP DON'T GUESS** -- tumor classification determines treatment. Always verify codes and biomarker interpretation via tools rather than relying on memory.
### Histological vs Molecular Classification
Tumors are classified on TWO axes -- both matter for treatment selection:
- **Histological** (what it looks like under microscope): adenocarcinoma, squamous, small cell, etc. This determines the OncoTree hierarchy level 3+.
- **Molecular** (what mutations/alterations drive it): EGFR-mutant, HER2-amplified, MSI-high, etc. This determines OncoKB therapeutic levels.
A tumor can be histologically identical to another but molecularly different, requiring different treatment. Example: two lung adenocarcinomas (both LUAD) but one is EGFR-mutant (targeted therapy) and another is KRAS-mutant (different targeted therapy). **Always check both axes.**
### Biomarker Interpretation Strategy
When interpreting cancer biomarkers, use OncoKB for actionability:
- **HER2**: Positive = IHC 3+ or FISH-amplified. Use `OncoKB_annotate_variant(gene="ERBB2", variant="Amplification", tumor_type="BRCA")` for therapeutic level
- **ER/PR**: Positive = hInstall and configure ToolUniverse for any use case — MCP server (chat-based), CLI (command line with 9 subcommands), or Python SDK (Coding API with 3 calling patterns). Covers uv/uvx setup, MCP configuration for 12+ AI clients (Cursor, Claude Desktop, Windsurf, VS Code, Codex, Gemini CLI, Trae, Cline, etc.), full CLI reference (tu list/grep/find/info/run/test/status/build/serve), Coding API quickstart, agentic tools, code executor, API key walkthrough, skill installation, and upgrading. Use when user asks how to set up ToolUniverse, which access mode to use (MCP vs CLI vs SDK), configuring MCP servers, using the CLI, troubleshooting installation, upgrading, or mentions installing ToolUniverse or setting up scientific tools. Also triggers for "how do I use ToolUniverse", "what's the best way to access tools", "command line", "tu command", "coding API", "tu build".
Systematic ACMG/AMP germline variant classification with all 28 criteria (PVS1, PS1-4, PM1-6, PP1-5, BA1, BS1-4, BP1-7) for clinical significance. Produces 5-tier verdict (Pathogenic / Likely Pathogenic / VUS / Likely Benign / Benign) with cited evidence per criterion. Use for variant interpretation, VUS resolution, and pathogenicity assessment. Combines ClinVar, gnomAD, computational predictors, and gene-mechanism context.
Comprehensive ADMET (Absorption, Distribution, Metabolism, Excretion, Toxicity) profiling for drug candidates. Integrates ADMET-AI predictions, SwissADME drug-likeness, PubChemTox experimental toxicity, ChEMBL clinical data, Lipinski rule-of-five, and CYP interaction data. Use for drug-likeness assessment, BBB penetration, bioavailability, hepatotoxicity prediction, ADME/PK profiling, or screening compound libraries before lab testing.
Detect and analyze adverse drug event signals using FDA FAERS reports, drug labels, and disproportionality statistics (PRR, ROR, IC). Generates quantitative safety signal scores (0-100) with evidence grading. Use for post-market surveillance, pharmacovigilance, drug safety assessment, regulatory submissions, and detecting rare AE signals not visible in clinical trials.
Map environmental and industrial chemicals to adverse outcome pathways (AOPs) — molecular initiating event to organ-level toxicity. Uses AOPWiki, GHS classification, IARC carcinogen status, and LD50 data. Use for environmental/industrial chemical risk assessment, regulatory-grade hazard characterization, and AOP stressor mapping. Distinct from drug-safety analysis (use tooluniverse-pharmacovigilance for drugs).
Aging biology, cellular senescence, and longevity research. Covers senescence markers (p16/CDKN2A, SASP, SA-beta-gal), aging hallmarks, senolytic drug discovery (dasatinib+quercetin, fisetin, navitoclax), epigenetic clocks, telomere biology, and longevity GWAS. Use for senescence-pathway analysis, age-related disease genetics, senolytic-target discovery, and centenarian-genetics queries. Distinguishes correlative vs causal evidence (knockout, intervention).
Therapeutic antibody engineering and optimization, lead-to-clinical-candidate. Covers sequence humanization (germline alignment, framework retention), affinity maturation, developability (aggregation, stability, PTMs), structure modeling (AlphaFold/PDB CDR analysis), immunogenicity prediction, and manufacturing feasibility. Use for biologic-drug optimization, mAb design review, biosimilar engineering, and clinical-precedent comparison.
Discover novel small-molecule binders for protein targets using structure-based and ligand-based screening. Covers druggability assessment, known-ligand mining (ChEMBL, BindingDB), similarity expansion, ADMET filtering, and synthesis feasibility. Use for hit identification, virtual screening, target-to-compounds workflows, and lead-finding before commit-to-medchem.