Skill1.6k repo starsupdated today

tooluniverse-metagenomics-analysis

# ClaudeWave: tooluniverse-metagenomics-analysis This Claude Code skill performs integrated microbiome analysis by connecting MGnify, GTDB taxonomy, ENA sequencing databases, and literature sources to classify microbial taxa, assess genome quality via CheckM metrics, link microbiome composition to clinical phenotypes, and interpret findings through pathway analysis. Use it when analyzing amplicon or shotgun metagenomics studies to move from raw taxonomic data toward biological interpretation with rigorous quality control and literature context.

View source Repository: ToolUniverse

Install in Claude Code

Copy

git clone --depth 1 https://github.com/mims-harvard/ToolUniverse /tmp/tooluniverse-metagenomics-analysis && cp -r /tmp/tooluniverse-metagenomics-analysis/plugin/skills/tooluniverse-metagenomics-analysis ~/.claude/skills/tooluniverse-metagenomics-analysis

Then start a new Claude Code session; the skill loads automatically.

Definition

SKILL.md

# Metagenomics & Microbiome Analysis

Integrated pipeline for exploring microbiome studies, classifying taxa, assessing genome quality, linking microbial composition to clinical phenotypes, and interpreting findings through pathway analysis and literature context.

**Guiding principles**:
1. **Study context first** -- understand biome, sequencing method, and metadata before diving into taxa
2. **Taxonomic consistency** -- GTDB taxonomy as reference standard; reconcile NCBI where needed
3. **Genome quality matters** -- CheckM completeness/contamination thresholds determine trustworthy MAGs
4. **Interpretation over enumeration** -- explain what taxa mean for the biological question
5. **English-first queries** -- use English terms in tool calls

## LOOK UP, DON'T GUESS
When uncertain about any scientific fact, SEARCH databases first rather than reasoning from memory.

---

## COMPUTE, DON'T DESCRIBE
When analysis requires computation (statistics, data processing, scoring, enrichment), write and run Python code via Bash. Don't describe what you would do — execute it and report actual results. Use ToolUniverse tools to retrieve data, then Python (pandas, scipy, statsmodels, matplotlib) to analyze it.

## Core Databases

| Database | Best For |
|----------|---------|
| **MGnify** | Processed metagenomics studies, taxonomic/functional results |
| **GTDB** | Standardized bacterial/archaeal taxonomy, species-level resolution |
| **GMrepo** | Gut species-to-human-health phenotype associations |
| **ENA** | Raw sequencing datasets and study metadata |
| **KEGG** | Pathway mapping for microbial functional annotations |
| **PubMed/EuropePMC** | Published microbiome-disease studies |
| **CTD** | Chemical-microbiome-disease relationships |

---

## Workflow

```
Phase 0: Parse query → organism, biome, phenotype, or accession
Phase 1: Study Discovery → MGnify_search_studies, ENAPortal_search_studies
Phase 2: Taxonomic Classification → GTDB_search_genomes, GTDB_get_species, GTDB_search_taxon
Phase 3: Genome Quality → MGnify_search_genomes, MGnify_get_genome (CheckM metrics)
Phase 4: Functional Annotation → MGnify GO terms + KEGG pathway mapping
Phase 5: Clinical Associations → GMrepo species-phenotype links
Phase 6: Literature → PubMed/EuropePMC + CTD gene-disease
Phase 7: Interpretation & Report Synthesis
```

---

## Key Phase Notes

**Phase 1**: ENA requires structured queries (e.g., `study_title="*IBD*"`), not free text. If ENA fails, fall back to MGnify.

**Phase 2**: GTDB uses its own naming (e.g., `s__Bacteroides_A fragilis` vs NCBI `Bacteroides fragilis`). Always note discrepancies. Use `GTDB_search_taxon(operation="search_taxon", query=name)`.

**Phase 3 - Quality tiers** (MIMAG):
- **High**: >= 90% complete, <= 5% contamination, rRNA + >= 18 tRNAs
- **Medium**: >= 50% complete, <= 10% contamination
- **Low**: below medium -- flag but don't exclude

**Phase 4 - Functional interpretation**: Don't just list GO terms. Connect to biology:

| Functional Category | Key KEGG Pathways | Significance |
|---|---|---|
| SCFA production | map00650, map00640 | Gut barrier, anti-inflammatory |
| LPS biosynthesis | map00540 | Pro-inflammatory, endotoxemia |
| Bile acid metabolism | map00120 | Fat absorption, FXR signaling |
| Tryptophan metabolism | map00380 | Serotonin, AhR, immune |
| Vitamin biosynthesis | map00730/740/760 | Host nutritional contribution |

Use `kegg_search_pathway(keyword=...)` (NOT `query`). Pathway IDs need organism prefix (`hsa`, `ko`, `eco`), NOT bare `map`.

**Phase 5**: GMrepo uses MeSH terms: "Crohn Disease" not "IBD", "Colitis, Ulcerative" not "UC", "Colorectal Neoplasms" not "colorectal cancer". Try NCBI taxon IDs if species name fails.

**Phase 6 - Evidence grading**:
- **Strong**: Meta-analysis or >5 studies, consistent direction
- **Moderate**: 2-5 studies consistent, or 1 large cohort
- **Preliminary**: Single study or conflicting
- **Mechanistic only**: In vitro/animal, no human epidemiology

**Phase 7 - Report**: Executive summary, study landscape, GTDB taxonomy, functional interpretation (not GO term lists), clinical relevance with evidence grades, mechanistic model, genome catalog with quality tiers, data gaps.

---

## Edge Cases & Fallbacks

- **Taxon not in GTDB**: Try partial search or fall back to MGnify (NCBI taxonomy)
- **No GMrepo data**: Normal for non-gut organisms; use literature
- **GMrepo 0 results**: Use formal MeSH terms or NCBI taxon IDs
- **No KEGG match**: Check MetaCyc or literature

## Limitations

- **GMrepo**: Gut-only
- **GTDB**: Bacteria/Archaea only
- **ENA**: Raw data only, strict query syntax
- **No sequence analysis**: Queries databases, not raw FASTQ/FASTA

More from this repository

setup-tooluniverseSkill

Install and configure ToolUniverse for any use case — MCP server (chat-based), CLI (command line with 9 subcommands), or Python SDK (Coding API with 3 calling patterns). Covers uv/uvx setup, MCP configuration for 12+ AI clients (Cursor, Claude Desktop, Windsurf, VS Code, Codex, Gemini CLI, Trae, Cline, etc.), full CLI reference (tu list/grep/find/info/run/test/status/build/serve), Coding API quickstart, agentic tools, code executor, API key walkthrough, skill installation, and upgrading. Use when user asks how to set up ToolUniverse, which access mode to use (MCP vs CLI vs SDK), configuring MCP servers, using the CLI, troubleshooting installation, upgrading, or mentions installing ToolUniverse or setting up scientific tools. Also triggers for "how do I use ToolUniverse", "what's the best way to access tools", "command line", "tu command", "coding API", "tu build".

tooluniverse-acmg-variant-classificationSkill

Systematic ACMG/AMP germline variant classification with all 28 criteria (PVS1, PS1-4, PM1-6, PP1-5, BA1, BS1-4, BP1-7) for clinical significance. Produces 5-tier verdict (Pathogenic / Likely Pathogenic / VUS / Likely Benign / Benign) with cited evidence per criterion. Use for variant interpretation, VUS resolution, and pathogenicity assessment. Combines ClinVar, gnomAD, computational predictors, and gene-mechanism context.

tooluniverse-admet-predictionSkill

Comprehensive ADMET (Absorption, Distribution, Metabolism, Excretion, Toxicity) profiling for drug candidates. Integrates ADMET-AI predictions, SwissADME drug-likeness, PubChemTox experimental toxicity, ChEMBL clinical data, Lipinski rule-of-five, and CYP interaction data. Use for drug-likeness assessment, BBB penetration, bioavailability, hepatotoxicity prediction, ADME/PK profiling, or screening compound libraries before lab testing.

tooluniverse-adverse-event-detectionSkill

Detect and analyze adverse drug event signals using FDA FAERS reports, drug labels, and disproportionality statistics (PRR, ROR, IC). Generates quantitative safety signal scores (0-100) with evidence grading. Use for post-market surveillance, pharmacovigilance, drug safety assessment, regulatory submissions, and detecting rare AE signals not visible in clinical trials.

tooluniverse-adverse-outcome-pathwaySkill

Map environmental and industrial chemicals to adverse outcome pathways (AOPs) — molecular initiating event to organ-level toxicity. Uses AOPWiki, GHS classification, IARC carcinogen status, and LD50 data. Use for environmental/industrial chemical risk assessment, regulatory-grade hazard characterization, and AOP stressor mapping. Distinct from drug-safety analysis (use tooluniverse-pharmacovigilance for drugs).

tooluniverse-aging-senescenceSkill

Aging biology, cellular senescence, and longevity research. Covers senescence markers (p16/CDKN2A, SASP, SA-beta-gal), aging hallmarks, senolytic drug discovery (dasatinib+quercetin, fisetin, navitoclax), epigenetic clocks, telomere biology, and longevity GWAS. Use for senescence-pathway analysis, age-related disease genetics, senolytic-target discovery, and centenarian-genetics queries. Distinguishes correlative vs causal evidence (knockout, intervention).

tooluniverse-antibody-engineeringSkill

Therapeutic antibody engineering and optimization, lead-to-clinical-candidate. Covers sequence humanization (germline alignment, framework retention), affinity maturation, developability (aggregation, stability, PTMs), structure modeling (AlphaFold/PDB CDR analysis), immunogenicity prediction, and manufacturing feasibility. Use for biologic-drug optimization, mAb design review, biosimilar engineering, and clinical-precedent comparison.

tooluniverse-binder-discoverySkill

Discover novel small-molecule binders for protein targets using structure-based and ligand-based screening. Covers druggability assessment, known-ligand mining (ChEMBL, BindingDB), similarity expansion, ADMET filtering, and synthesis feasibility. Use for hit identification, virtual screening, target-to-compounds workflows, and lead-finding before commit-to-medchem.