tooluniverse-immune-repertoire-analysis
This skill analyzes T-cell receptor (TCR) and B-cell receptor (BCR) repertoire sequencing data through an 8-phase workflow including clonotype identification, V(D)J gene usage assessment, CDR3 sequence characterization, clonal expansion detection, and epitope specificity prediction. Use it to characterize adaptive immune responses, monitor immune reconstitution post-treatment, track antigen-specific clones during immunotherapy or vaccination, and identify clonal expansions indicative of infection or tumor-infiltrating lymphocyte activity.
git clone --depth 1 https://github.com/mims-harvard/ToolUniverse /tmp/tooluniverse-immune-repertoire-analysis && cp -r /tmp/tooluniverse-immune-repertoire-analysis/plugin/skills/tooluniverse-immune-repertoire-analysis ~/.claude/skills/tooluniverse-immune-repertoire-analysisSKILL.md
# ToolUniverse Immune Repertoire Analysis Comprehensive skill for analyzing T-cell receptor (TCR) and B-cell receptor (BCR) repertoire sequencing data to characterize adaptive immune responses, clonal expansion, and antigen specificity. ## Domain Reasoning Repertoire diversity reflects immune history. High clonality — a few clones dominating — indicates antigen-driven expansion, as seen in active infection, tumor-infiltrating lymphocytes, or chronic stimulation. Low diversity points to immunodeficiency or treatment-induced lymphopenia. Always compare observed metrics against healthy donor reference distributions before drawing conclusions; a Shannon entropy of 7 is unremarkable in a healthy adult but alarming post-chemotherapy. ## LOOK UP DON'T GUESS - Clonotype frequency thresholds, CDR3 length ranges, and convergence ratios: query IEDB or VDJdb; do not assume values from memory. - Epitope specificities for expanded clones: search `iedb_search_tcell_assays` and `BVBRC_search_epitopes`; never infer antigen identity from CDR3 alone. - V gene family usage biases in healthy donors: retrieve published reference data or query ImmPort; do not assume baseline distributions are uniform. - Sequencing depth adequacy: compute rarefaction curves from the actual data; do not guess whether depth is sufficient. --- ## Overview Adaptive immune receptor repertoire sequencing (AIRR-seq) enables comprehensive profiling of T-cell and B-cell populations through high-throughput sequencing of TCR and BCR variable regions. This skill provides an 8-phase workflow for: - Clonotype identification and tracking - Diversity and clonality assessment - V(D)J gene usage analysis - CDR3 sequence characterization - Clonal expansion and convergence detection - Epitope specificity prediction - Integration with single-cell phenotyping - Longitudinal repertoire tracking --- ## Core Workflow ### Phase 1: Data Import & Clonotype Definition Load AIRR-seq data from common formats (MiXCR, ImmunoSEQ, AIRR standard, 10x Genomics VDJ). Standardize columns to: `cloneId`, `count`, `frequency`, `cdr3aa`, `cdr3nt`, `v_gene`, `j_gene`, `chain`. Define clonotypes using one of three methods: - **cdr3aa**: Amino acid CDR3 sequence only - **cdr3nt**: Nucleotide CDR3 sequence - **vj_cdr3**: V gene + J gene + CDR3aa (most common, recommended) Aggregate by clonotype, sort by count, assign ranks. ### Phase 2: Diversity & Clonality Analysis Calculate diversity metrics for the repertoire: - **Shannon entropy**: Overall diversity (higher = more diverse) - **Simpson index**: Probability two random clones are same - **Inverse Simpson**: Effective number of clonotypes - **Gini coefficient**: Inequality in clonotype distribution - **Clonality**: 1 - Pielou's evenness (higher = more clonal) - **Richness**: Number of unique clonotypes Generate rarefaction curves to assess whether sequencing depth is sufficient. ### Phase 3: V(D)J Gene Usage Analysis Analyze V and J gene usage patterns weighted by clonotype count: - V gene family usage frequencies - J gene family usage frequencies - V-J pairing frequencies - Statistical testing for biased usage (chi-square test vs. uniform expectation) ### Phase 4: CDR3 Sequence Analysis Characterize CDR3 sequences: - **Length distribution**: Typical TCR CDR3 = 12-18 aa; BCR CDR3 = 10-20 aa - **Amino acid composition**: Weighted by clonotype frequency - Flag unusual length distributions (may indicate PCR bias) ### Phase 5: Clonal Expansion Detection Identify expanded clonotypes above a frequency threshold (default: 95th percentile). Track clonotypes longitudinally across multiple timepoints to measure persistence, mean/max frequency, and fold changes. ### Phase 6: Convergence & Public Clonotypes - **Convergent recombination**: Same CDR3 amino acid from different nucleotide sequences (evidence of antigen-driven selection) - **Public clonotypes**: Shared across multiple samples/individuals (may indicate common antigen responses) ### Phase 7: Epitope Prediction & Specificity Query epitope databases for known TCR-epitope associations: - **IEDB** (`iedb_search_tcell_assays`): Search T-cell assay records by sequence or MHC class; use `iedb_search_epitopes` with `sequence_contains` for motif search - **BVBRC** (`BVBRC_search_epitopes`): Best for organism-based epitope discovery (e.g., `taxon_id="2697049"` for SARS-CoV-2); returns epitope sequences with T-cell/B-cell assay counts - **VDJdb** (manual): https://vdjdb.cdr3.net/search - **PubMed literature** (`PubMed_search_articles`): Search for CDR3 + epitope/antigen/specificity - **IEDB detail tools**: `iedb_get_epitope_antigens` (link epitope→antigen), `iedb_get_epitope_mhc` (MHC restriction) ### Phase 8: Integration with Single-Cell Data Link TCR/BCR clonotypes to cell phenotypes from paired single-cell RNA-seq: - Map clonotypes to cell barcodes - Identify expanded clonotype phenotypes on UMAP - Analyze clonotype-cluster associations (cross-tabulation) - Find cluster-specific clonotypes (>80% cells in one cluster) - Differential gene expression: expanded vs. non-expanded cells --- ## ToolUniverse Tool Integration **Key Tools Used**: - `iedb_search_tcell_assays` - T-cell assay records (sequence, MHC class filters) - `iedb_search_bcell` - B-cell assay records - `iedb_search_epitopes` - Epitope motif search via `sequence_contains` - `BVBRC_search_epitopes` - Organism-based epitope discovery (best for pathogen-specific queries) - `NCBI_SRA_search_runs` - Find public TCR/BCR-seq datasets (use strategy="AMPLICON") - `ImmPort_search_studies` - NIAID immunology studies (vaccine trials, flow cytometry) - `PubMed_search_articles` - Literature on TCR/BCR specificity - `UniProt_get_entry_by_accession` - Antigen protein information **Integration with Other Skills**: - `tooluniverse-single-cell` - Single-cell transcriptomics - `tooluniverse-rnaseq-deseq2` - Bulk RNA-seq analysis - `tooluniverse-variant-analysis` - Somatic hypermutation analysis (BCR) --- ## Q
Install and configure ToolUniverse for any use case — MCP server (chat-based), CLI (command line with 9 subcommands), or Python SDK (Coding API with 3 calling patterns). Covers uv/uvx setup, MCP configuration for 12+ AI clients (Cursor, Claude Desktop, Windsurf, VS Code, Codex, Gemini CLI, Trae, Cline, etc.), full CLI reference (tu list/grep/find/info/run/test/status/build/serve), Coding API quickstart, agentic tools, code executor, API key walkthrough, skill installation, and upgrading. Use when user asks how to set up ToolUniverse, which access mode to use (MCP vs CLI vs SDK), configuring MCP servers, using the CLI, troubleshooting installation, upgrading, or mentions installing ToolUniverse or setting up scientific tools. Also triggers for "how do I use ToolUniverse", "what's the best way to access tools", "command line", "tu command", "coding API", "tu build".
Systematic ACMG/AMP germline variant classification with all 28 criteria (PVS1, PS1-4, PM1-6, PP1-5, BA1, BS1-4, BP1-7) for clinical significance. Produces 5-tier verdict (Pathogenic / Likely Pathogenic / VUS / Likely Benign / Benign) with cited evidence per criterion. Use for variant interpretation, VUS resolution, and pathogenicity assessment. Combines ClinVar, gnomAD, computational predictors, and gene-mechanism context.
Comprehensive ADMET (Absorption, Distribution, Metabolism, Excretion, Toxicity) profiling for drug candidates. Integrates ADMET-AI predictions, SwissADME drug-likeness, PubChemTox experimental toxicity, ChEMBL clinical data, Lipinski rule-of-five, and CYP interaction data. Use for drug-likeness assessment, BBB penetration, bioavailability, hepatotoxicity prediction, ADME/PK profiling, or screening compound libraries before lab testing.
Detect and analyze adverse drug event signals using FDA FAERS reports, drug labels, and disproportionality statistics (PRR, ROR, IC). Generates quantitative safety signal scores (0-100) with evidence grading. Use for post-market surveillance, pharmacovigilance, drug safety assessment, regulatory submissions, and detecting rare AE signals not visible in clinical trials.
Map environmental and industrial chemicals to adverse outcome pathways (AOPs) — molecular initiating event to organ-level toxicity. Uses AOPWiki, GHS classification, IARC carcinogen status, and LD50 data. Use for environmental/industrial chemical risk assessment, regulatory-grade hazard characterization, and AOP stressor mapping. Distinct from drug-safety analysis (use tooluniverse-pharmacovigilance for drugs).
Aging biology, cellular senescence, and longevity research. Covers senescence markers (p16/CDKN2A, SASP, SA-beta-gal), aging hallmarks, senolytic drug discovery (dasatinib+quercetin, fisetin, navitoclax), epigenetic clocks, telomere biology, and longevity GWAS. Use for senescence-pathway analysis, age-related disease genetics, senolytic-target discovery, and centenarian-genetics queries. Distinguishes correlative vs causal evidence (knockout, intervention).
Therapeutic antibody engineering and optimization, lead-to-clinical-candidate. Covers sequence humanization (germline alignment, framework retention), affinity maturation, developability (aggregation, stability, PTMs), structure modeling (AlphaFold/PDB CDR analysis), immunogenicity prediction, and manufacturing feasibility. Use for biologic-drug optimization, mAb design review, biosimilar engineering, and clinical-precedent comparison.
Discover novel small-molecule binders for protein targets using structure-based and ligand-based screening. Covers druggability assessment, known-ligand mining (ChEMBL, BindingDB), similarity expansion, ADMET filtering, and synthesis feasibility. Use for hit identification, virtual screening, target-to-compounds workflows, and lead-finding before commit-to-medchem.