Skill1.6k repo starsupdated today

tooluniverse-multi-omics-integration

This Claude Code skill orchestrates analysis across multiple biological data types (RNA, protein, epigenomics, genomics, metabolomics) by performing per-layer quality control, cross-omics correlation to identify concordant molecular signals, multi-omics clustering for patient stratification, and pathway-level integration. Use it for comprehensive systems biology studies, disease characterization requiring multiple molecular perspectives, and biomarker discovery where relationships between molecular layers provide mechanistic insights beyond single-omics analysis.

View source Repository: ToolUniverse

Install in Claude Code

Copy

git clone --depth 1 https://github.com/mims-harvard/ToolUniverse /tmp/tooluniverse-multi-omics-integration && cp -r /tmp/tooluniverse-multi-omics-integration/plugin/skills/tooluniverse-multi-omics-integration ~/.claude/skills/tooluniverse-multi-omics-integration

Then start a new Claude Code session; the skill loads automatically.

Definition

SKILL.md

# Multi-Omics Integration

Coordinate and integrate multiple omics datasets for comprehensive systems biology analysis. Orchestrates specialized ToolUniverse skills to perform cross-omics correlation, multi-omics clustering, pathway-level integration, and unified interpretation.

---

## Domain Reasoning

Multi-omics integration asks whether different molecular layers tell a concordant story. If a gene is upregulated in RNA-seq AND its protein is elevated in proteomics, that is concordant evidence of true biological change. Discordance — high mRNA but low protein, or elevated protein without matching mRNA — may indicate post-transcriptional regulation (miRNA silencing, protein degradation, translational control) and is itself a meaningful finding worth reporting. Not every discordance is noise; some are the most interesting biology.

## LOOK UP DON'T GUESS

- Expected RNA-protein correlation ranges: compute Spearman r from the actual data; the typical range (0.4-0.6) is a guide, not a guarantee.
- Pathway enrichment results: run `ReactomeAnalysis_pathway_enrichment` or gseapy on the actual gene lists; never list enriched pathways from memory.
- eQTL associations: query GTEx or eQTL databases for the specific variant and tissue; do not assume regulatory relationships.
- Methylation-expression directionality at specific loci: retrieve experimental data; promoter repression is the canonical model but exceptions exist.

---

## When to Use This Skill

- User has multiple omics datasets (RNA-seq + proteomics, methylation + expression, etc.)
- Cross-omics correlation queries (e.g., "How does methylation affect expression?")
- Multi-omics biomarker discovery or patient subtyping
- Systems biology questions requiring multiple molecular layers
- Precision medicine applications with multi-omics patient data

---

## Workflow Overview

```
Phase 1: Data Loading & QC
  Load each omics type, format-specific QC, normalize
  Supported: RNA-seq, proteomics, methylation, CNV/SNV, metabolomics

Phase 2: Sample Matching
  Harmonize sample IDs, find common samples, handle missing omics

Phase 3: Feature Mapping
  Map features to common gene-level identifiers
  CpG->gene (promoter), CNV->gene, metabolite->enzyme

Phase 4: Cross-Omics Correlation
  RNA vs Protein (translation efficiency)
  Methylation vs Expression (epigenetic regulation)
  CNV vs Expression (dosage effect)
  eQTL variants vs Expression (genetic regulation)

Phase 5: Multi-Omics Clustering
  MOFA+, NMF, SNF for patient subtyping

Phase 6: Pathway-Level Integration
  Aggregate omics evidence at pathway level
  Score pathway dysregulation with combined evidence

Phase 7: Biomarker Discovery
  Feature selection across omics, multi-omics classification

Phase 8: Integrated Report
  Summary, correlations, clusters, pathways, biomarkers
```

See: phase_details.md for complete code and implementation details.

---

## Supported Data Types

| Omics | Formats | QC Focus |
|-------|---------|----------|
| Transcriptomics | CSV/TSV, HDF5, h5ad | Low-count filter, normalize (TPM/DESeq2), log-transform |
| Proteomics | MaxQuant, Spectronaut, DIA-NN | Missing value imputation, median/quantile normalization |
| Methylation | IDAT, beta matrices | Failed probes, batch correction, cross-reactive filter |
| Genomics | VCF, SEG (CNV) | Variant QC, CNV segmentation |
| Metabolomics | Peak tables | Missing values, normalization |

---

## Core Operations

### Sample Matching

```python
def match_samples_across_omics(omics_data_dict):
    """Match samples across multiple omics datasets."""
    sample_ids = {k: set(df.columns) for k, df in omics_data_dict.items()}
    common_samples = set.intersection(*sample_ids.values())
    matched_data = {k: df[sorted(common_samples)] for k, df in omics_data_dict.items()}
    return sorted(common_samples), matched_data
```

### Cross-Omics Correlation

```python
from scipy.stats import spearmanr, pearsonr

# RNA vs Protein: expect positive r ~ 0.4-0.6
# Methylation vs Expression: expect negative r (promoter repression)
# CNV vs Expression: expect positive r (dosage effect)

for gene in common_genes:
    r, p = spearmanr(rna[gene], protein[gene])
```

### Pathway Integration

```python
# Score pathway dysregulation using combined evidence from all omics
# Aggregate per-gene evidence, then per-pathway
pathway_score = mean(abs(rna_fc) + abs(protein_fc) + abs(meth_diff) + abs(cnv))
```

See: phase_details.md for full implementations of each operation.

---

## Multi-Omics Clustering Methods

| Method | Description | Best For |
|--------|-------------|----------|
| **MOFA+** | Latent factors explaining cross-omics variation | Identifying shared/omics-specific drivers |
| **Joint NMF** | Shared decomposition across omics | Patient subtype discovery |
| **SNF** | Similarity network fusion | Integrating heterogeneous data types |

---

## ToolUniverse Skills Coordination

| Skill | Used For | Phase |
|-------|----------|-------|
| `tooluniverse-rnaseq-deseq2` | RNA-seq analysis | 1, 4 |
| `tooluniverse-epigenomics` | Methylation, ChIP-seq | 1, 4 |
| `tooluniverse-variant-analysis` | CNV/SNV processing | 1, 3, 4 |
| `tooluniverse-protein-interactions` | Protein network context | 6 |
| `tooluniverse-gene-enrichment` | Pathway enrichment | 6 |
| `tooluniverse-expression-data-retrieval` | Public data retrieval | 1 |
| `tooluniverse-target-research` | Gene/protein annotation | 3, 8 |

---

## Use Cases

### Cancer Multi-Omics
Integrate TCGA RNA-seq + proteomics + methylation + CNV to identify patient subtypes, cross-omics driver genes, and multi-omics biomarkers.

### eQTL + Expression + Methylation
Identify SNP -> methylation -> expression regulatory chains (mediation analysis).

### Drug Response Multi-Omics
Predict drug response using baseline multi-omics profiles; identify resistance/sensitivity pathways.

See: phase_details.md "Use Cases" for detailed step-by-step workflows.

---

## Quantified Minimums

| Component | Requirement |
|-

More from this repository

setup-tooluniverseSkill

Install and configure ToolUniverse for any use case — MCP server (chat-based), CLI (command line with 9 subcommands), or Python SDK (Coding API with 3 calling patterns). Covers uv/uvx setup, MCP configuration for 12+ AI clients (Cursor, Claude Desktop, Windsurf, VS Code, Codex, Gemini CLI, Trae, Cline, etc.), full CLI reference (tu list/grep/find/info/run/test/status/build/serve), Coding API quickstart, agentic tools, code executor, API key walkthrough, skill installation, and upgrading. Use when user asks how to set up ToolUniverse, which access mode to use (MCP vs CLI vs SDK), configuring MCP servers, using the CLI, troubleshooting installation, upgrading, or mentions installing ToolUniverse or setting up scientific tools. Also triggers for "how do I use ToolUniverse", "what's the best way to access tools", "command line", "tu command", "coding API", "tu build".

tooluniverse-acmg-variant-classificationSkill

Systematic ACMG/AMP germline variant classification with all 28 criteria (PVS1, PS1-4, PM1-6, PP1-5, BA1, BS1-4, BP1-7) for clinical significance. Produces 5-tier verdict (Pathogenic / Likely Pathogenic / VUS / Likely Benign / Benign) with cited evidence per criterion. Use for variant interpretation, VUS resolution, and pathogenicity assessment. Combines ClinVar, gnomAD, computational predictors, and gene-mechanism context.

tooluniverse-admet-predictionSkill

Comprehensive ADMET (Absorption, Distribution, Metabolism, Excretion, Toxicity) profiling for drug candidates. Integrates ADMET-AI predictions, SwissADME drug-likeness, PubChemTox experimental toxicity, ChEMBL clinical data, Lipinski rule-of-five, and CYP interaction data. Use for drug-likeness assessment, BBB penetration, bioavailability, hepatotoxicity prediction, ADME/PK profiling, or screening compound libraries before lab testing.

tooluniverse-adverse-event-detectionSkill

Detect and analyze adverse drug event signals using FDA FAERS reports, drug labels, and disproportionality statistics (PRR, ROR, IC). Generates quantitative safety signal scores (0-100) with evidence grading. Use for post-market surveillance, pharmacovigilance, drug safety assessment, regulatory submissions, and detecting rare AE signals not visible in clinical trials.

tooluniverse-adverse-outcome-pathwaySkill

Map environmental and industrial chemicals to adverse outcome pathways (AOPs) — molecular initiating event to organ-level toxicity. Uses AOPWiki, GHS classification, IARC carcinogen status, and LD50 data. Use for environmental/industrial chemical risk assessment, regulatory-grade hazard characterization, and AOP stressor mapping. Distinct from drug-safety analysis (use tooluniverse-pharmacovigilance for drugs).

tooluniverse-aging-senescenceSkill

Aging biology, cellular senescence, and longevity research. Covers senescence markers (p16/CDKN2A, SASP, SA-beta-gal), aging hallmarks, senolytic drug discovery (dasatinib+quercetin, fisetin, navitoclax), epigenetic clocks, telomere biology, and longevity GWAS. Use for senescence-pathway analysis, age-related disease genetics, senolytic-target discovery, and centenarian-genetics queries. Distinguishes correlative vs causal evidence (knockout, intervention).

tooluniverse-antibody-engineeringSkill

Therapeutic antibody engineering and optimization, lead-to-clinical-candidate. Covers sequence humanization (germline alignment, framework retention), affinity maturation, developability (aggregation, stability, PTMs), structure modeling (AlphaFold/PDB CDR analysis), immunogenicity prediction, and manufacturing feasibility. Use for biologic-drug optimization, mAb design review, biosimilar engineering, and clinical-precedent comparison.

tooluniverse-binder-discoverySkill

Discover novel small-molecule binders for protein targets using structure-based and ligand-based screening. Covers druggability assessment, known-ligand mining (ChEMBL, BindingDB), similarity expansion, ADMET filtering, and synthesis feasibility. Use for hit identification, virtual screening, target-to-compounds workflows, and lead-finding before commit-to-medchem.