Skip to main content
ClaudeWave
Skill1.4k repo starsupdated today

tooluniverse-model-organism-genetics

This Claude Code skill maps human genes to orthologs across six model organisms (mouse, zebrafish, fruit fly, worm, yeast, rat) and retrieves phenotype, expression, and functional data from their respective databases. Use it when you need to assess gene function conservation across species, identify the best animal model for studying a human gene or disease, or understand how genetic findings translate between model systems and human biology.

Install in Claude Code
Copy
git clone --depth 1 https://github.com/mims-harvard/ToolUniverse /tmp/tooluniverse-model-organism-genetics && cp -r /tmp/tooluniverse-model-organism-genetics/plugin/skills/tooluniverse-model-organism-genetics ~/.claude/skills/tooluniverse-model-organism-genetics
Then start a new Claude Code session; the skill loads automatically.

SKILL.md

## COMPUTE, DON'T DESCRIBE
When analysis requires computation (statistics, data processing, scoring, enrichment), write and run Python code via Bash. Don't describe what you would do — execute it and report actual results. Use ToolUniverse tools to retrieve data, then Python (pandas, scipy, statsmodels, matplotlib) to analyze it.

# Model Organism Genetics Pipeline

Map human genes to model organism orthologs and retrieve phenotype, expression, and functional data across six species. Synthesize cross-species evidence to assess gene function conservation and identify the best animal models for studying human genes and diseases.

**Not for**: human variant interpretation (`tooluniverse-variant-analysis`), drug target validation (`tooluniverse-drug-target-validation`), human disease characterization (`tooluniverse-multiomic-disease-characterization`).

**LOOK UP, DON'T GUESS**: When asked about a species' taxonomy, ecology, or biology, search GBIF/NCBI Taxonomy first. For GBIF: use `GBIF_search_species(query="species name")`, then use the `nubKey` (not `key`) from the result to call `GBIF_get_species(speciesKey=nubKey)` for full taxonomy (kingdom, phylum, class, order, family). The `nubKey` is the GBIF backbone key; the `key` is dataset-specific and often lacks higher taxonomy.

---

## Reasoning Principles

### Ortholog Reasoning
Sequence conservation across species implies functional conservation — but not always. A highly conserved gene in mouse and human likely has the same function. But regulatory differences (when/where a gene is expressed) can cause different phenotypes even from the same gene. Always check: is the protein domain conserved, or just raw sequence? Are there known regulatory differences? A 40% identity ortholog with a conserved catalytic domain can be more functionally equivalent than a 90% identity paralog in the same species.

Paralog contamination is a common pitfall. Gene families (e.g., FOXP1/2/3/4, HOX clusters) generate false ortholog hits. Distinguish true orthologs from paralogs by checking synteny (conserved gene neighborhood) and homology type: 1:1 = likely true ortholog; 1:many or many:many = likely paralog expansion. If the target species has a single gene where humans have multiple (e.g., one fly FoxP vs four human FOXPs), it is the co-ortholog of all human paralogs — note this explicitly.

### Model Organism Selection
Choose your model by the question:
- **Mouse**: mammalian physiology, drug testing, immune system, CNS disease — best when you need human-like biology
- **Fly**: genetic screens, signaling pathways (Notch, Wnt, Hh first characterized here), neural circuits, aging — best for rapid genome-wide genetics
- **Worm**: cell lineage, apoptosis, RNAi screens, aging — best when you need single-cell resolution and mapped connectome
- **Zebrafish**: development, organ formation, live imaging, cardiac biology — best when you need vertebrate biology with optical access
- **Yeast**: cell cycle, DNA repair, metabolism, protein trafficking, chromatin — best for fundamental cell biology
- **Frog (Xenopus)**: early development, cell signaling, oocyte biochemistry — note X. laevis is allotetraploid (two homeologs: .L and .S)

Invertebrates (fly, worm, yeast) lack adaptive immunity and many vertebrate-specific organs — if the question involves those systems, they will be uninformative.

### Phenotype Transfer Reasoning
A knockout phenotype in mouse does not automatically predict the human phenotype. Ask three questions before inferring cross-species relevance:
1. **Is the pathway conserved?** A mouse cardiac phenotype only predicts human cardiac disease if the same developmental pathway operates in both hearts.
2. **Are there compensating paralogs?** If the mouse has one gene but humans have three paralogs, a mouse knockout can be more severe than loss of a single human paralog. Conversely, if humans lost a paralog that mice retain, the mouse KO may overpredict human phenotype.
3. **Is the gene dosage-sensitive?** Haploinsufficiency in mouse (heterozygous phenotype) is a stronger predictor of human dominant disease than phenotypes seen only in homozygous knockouts.

When phenotypes differ across species, consider regulatory divergence: the coding sequence may be conserved while the expression pattern has shifted. This can produce organisms with the "same gene" but different tissues of expression and therefore different phenotypes.

---

## Pipeline

### Phase 0: Human Gene Disambiguation (ALWAYS FIRST)

1. `MyGene_query_genes(query="<gene>")` — get Ensembl ID, Entrez ID, UniProt, symbol (filter by `symbol` match; first hit may be a pseudogene)
2. `ensembl_lookup_gene(gene_id="<ensembl_id>", species="homo_sapiens")` — validate
3. If disease context: `HPO_search_terms(query="<disease>")` — get HPO terms for phenotype matching

Fallback if gene not found: `UniProt_search(query="<gene>", organism="9606")`

**Output**: canonical symbol, Ensembl ID (ENSG), Entrez ID, UniProt accession.

---

### Phase 1: Ortholog Mapping

**Primary**: `EnsemblCompara_get_orthologues(gene="<ENSG>", species="human", target_species="<species>")`

Accepted `target_species` values: `"mouse"`, `"zebrafish"`, `"drosophila_melanogaster"` (NOT "fruitfly" — returns HTTP 400), `"caenorhabditis_elegans"`, `"saccharomyces_cerevisiae"`, `"xenopus_tropicalis"`

**Fallbacks** (if Ensembl Compara returns no results):
1. `PANTHER_ortholog(gene_id="<symbol>", organism=9606, target_organism=<taxon>)` — taxon IDs: mouse=10090, fly=7227, worm=6239, zebrafish=7955, yeast=559292, frog=8364
2. `NCBIDatasets_get_orthologs(gene_id="<entrez_id>")` — broad, all vertebrates
3. For fly: `FlyMine_search(query="<human_gene_symbol>")` — text search finds distant orthologs that automated tools miss; confirm with `FlyBase_get_gene_orthologs`
4. For worm: `WormBase_get_gene(gene_id="<gene_symbol>")` — gene record often contains ortholog info

**Cross-reference via Monarch**:
- `Monarch_search_gene(query="<gene_symbol>")
setup-tooluniverseSkill

Install and configure ToolUniverse for any use case — MCP server (chat-based), CLI (command line with 9 subcommands), or Python SDK (Coding API with 3 calling patterns). Covers uv/uvx setup, MCP configuration for 12+ AI clients (Cursor, Claude Desktop, Windsurf, VS Code, Codex, Gemini CLI, Trae, Cline, etc.), full CLI reference (tu list/grep/find/info/run/test/status/build/serve), Coding API quickstart, agentic tools, code executor, API key walkthrough, skill installation, and upgrading. Use when user asks how to set up ToolUniverse, which access mode to use (MCP vs CLI vs SDK), configuring MCP servers, using the CLI, troubleshooting installation, upgrading, or mentions installing ToolUniverse or setting up scientific tools. Also triggers for "how do I use ToolUniverse", "what's the best way to access tools", "command line", "tu command", "coding API", "tu build".

tooluniverse-acmg-variant-classificationSkill

Systematic ACMG/AMP germline variant classification with all 28 criteria (PVS1, PS1-4, PM1-6, PP1-5, BA1, BS1-4, BP1-7) for clinical significance. Produces 5-tier verdict (Pathogenic / Likely Pathogenic / VUS / Likely Benign / Benign) with cited evidence per criterion. Use for variant interpretation, VUS resolution, and pathogenicity assessment. Combines ClinVar, gnomAD, computational predictors, and gene-mechanism context.

tooluniverse-admet-predictionSkill

Comprehensive ADMET (Absorption, Distribution, Metabolism, Excretion, Toxicity) profiling for drug candidates. Integrates ADMET-AI predictions, SwissADME drug-likeness, PubChemTox experimental toxicity, ChEMBL clinical data, Lipinski rule-of-five, and CYP interaction data. Use for drug-likeness assessment, BBB penetration, bioavailability, hepatotoxicity prediction, ADME/PK profiling, or screening compound libraries before lab testing.

tooluniverse-adverse-event-detectionSkill

Detect and analyze adverse drug event signals using FDA FAERS reports, drug labels, and disproportionality statistics (PRR, ROR, IC). Generates quantitative safety signal scores (0-100) with evidence grading. Use for post-market surveillance, pharmacovigilance, drug safety assessment, regulatory submissions, and detecting rare AE signals not visible in clinical trials.

tooluniverse-adverse-outcome-pathwaySkill

Map environmental and industrial chemicals to adverse outcome pathways (AOPs) — molecular initiating event to organ-level toxicity. Uses AOPWiki, GHS classification, IARC carcinogen status, and LD50 data. Use for environmental/industrial chemical risk assessment, regulatory-grade hazard characterization, and AOP stressor mapping. Distinct from drug-safety analysis (use tooluniverse-pharmacovigilance for drugs).

tooluniverse-aging-senescenceSkill

Aging biology, cellular senescence, and longevity research. Covers senescence markers (p16/CDKN2A, SASP, SA-beta-gal), aging hallmarks, senolytic drug discovery (dasatinib+quercetin, fisetin, navitoclax), epigenetic clocks, telomere biology, and longevity GWAS. Use for senescence-pathway analysis, age-related disease genetics, senolytic-target discovery, and centenarian-genetics queries. Distinguishes correlative vs causal evidence (knockout, intervention).

tooluniverse-antibody-engineeringSkill

Therapeutic antibody engineering and optimization, lead-to-clinical-candidate. Covers sequence humanization (germline alignment, framework retention), affinity maturation, developability (aggregation, stability, PTMs), structure modeling (AlphaFold/PDB CDR analysis), immunogenicity prediction, and manufacturing feasibility. Use for biologic-drug optimization, mAb design review, biosimilar engineering, and clinical-precedent comparison.

tooluniverse-binder-discoverySkill

Discover novel small-molecule binders for protein targets using structure-based and ligand-based screening. Covers druggability assessment, known-ligand mining (ChEMBL, BindingDB), similarity expansion, ADMET filtering, and synthesis feasibility. Use for hit identification, virtual screening, target-to-compounds workflows, and lead-finding before commit-to-medchem.