Skip to main content
ClaudeWave
Skill1.4k estrellas del repoactualizado today

tooluniverse-crispr-screen-analysis

This skill analyzes CRISPR-Cas9 genetic screen data through a comprehensive workflow that processes sgRNA counts, performs quality control, calculates MAGeCK gene-level scores, detects essential and synthetic-lethal genes, and identifies enriched pathways via GSEA. Use it to interpret genome-wide essentiality screens, discover synthetic-lethality interactions, analyze dropout or positive-selection screens, prioritize therapeutic targets, and validate resistance mechanisms in your CRISPR screen experiments.

Instalar en Claude Code
Copiar
git clone --depth 1 https://github.com/mims-harvard/ToolUniverse /tmp/tooluniverse-crispr-screen-analysis && cp -r /tmp/tooluniverse-crispr-screen-analysis/plugin/skills/tooluniverse-crispr-screen-analysis ~/.claude/skills/tooluniverse-crispr-screen-analysis
Después abre una sesión nueva de Claude Code; el skill carga automáticamente.

SKILL.md

# ToolUniverse CRISPR Screen Analysis

## RULE ZERO — Check for pre-computed results FIRST

Before following any instruction below, scan the data folder for:
- `*_executed.ipynb` → read with `tu run read_executed_notebook '{"data_folder":"<path>","search":"<keyword>"}'` and cite its cell outputs as the authoritative answer
- Pre-computed result files (CSV/TSV with names like `*results*`, `*deseq*`, `*enrich*`, `*stats*`, `*_simplified.csv`) → read directly and report the requested value
- Canonical analysis scripts (`analysis.R`, `run_*.py`, `find_*.R`, `*.Rmd`) → execute as-is and read the output

Only follow this skill's re-analysis recipe below if **none** of the above exist. Re-running from raw data produces different numbers than the published answer and is much slower (often 5-10× turn count).

---

Comprehensive skill for analyzing CRISPR-Cas9 genetic screens to identify essential genes, synthetic lethal interactions, and therapeutic targets through robust statistical analysis and pathway enrichment.

## Overview

CRISPR screens enable genome-wide functional genomics by systematically perturbing genes and measuring fitness effects. This skill provides an 8-phase workflow for:
- Processing sgRNA count matrices
- Quality control and normalization
- Gene-level essentiality scoring (MAGeCK-like and BAGEL-like approaches)
- Synthetic lethality detection
- Pathway enrichment analysis
- Drug target prioritization with DepMap integration
- Integration with expression and mutation data

---

## Core Workflow

### Phase 1: Data Import & sgRNA Count Processing

Load sgRNA count matrix (MAGeCK format or generic TSV). Expected columns: `sgRNA`, `Gene`, plus sample columns. Create experimental design table linking samples to conditions (baseline/treatment) with replicate assignments.

### Phase 2: Quality Control & Filtering

Assess sgRNA distribution quality:
- **Library sizes** per sample (total reads)
- **Zero-count sgRNAs**: Count across samples
- **Low-count filtering**: Remove sgRNAs below threshold (default: <30 reads in >N-2 samples)
- **Gini coefficient**: Assess distribution skewness per sample
- Report filtering recommendations

### Phase 3: Normalization

Normalize sgRNA counts to account for library size differences:
- **Median ratio** (DESeq2-like): Calculate geometric mean reference, compute size factors as median of ratios
- **Total count** (CPM-like): Divide by library size in millions

Calculate log2 fold changes (LFC) between treatment and control conditions with pseudocount.

### Phase 4: Gene-Level Scoring

Two scoring approaches:
- **MAGeCK-like (RRA)**: Rank all sgRNAs by LFC, compute mean rank per gene. Lower mean rank = more essential. Includes sgRNA count and mean LFC per gene.
- **BAGEL-like (Bayes Factor)**: Use reference essential/non-essential gene sets to estimate LFC distributions. Calculate likelihood ratio (Bayes Factor) for each gene. Higher BF = more likely essential.

### Phase 5: Synthetic Lethality Detection

Compare essentiality scores between wildtype and mutant cell lines:
- Merge gene scores, calculate delta LFC and delta rank
- Filter for genes essential in mutant (LFC < threshold) but not wildtype (LFC > -0.5) with large rank change
- Sort by differential essentiality

Query DepMap/literature for known dependencies using PubMed search.

### Phase 6: Pathway Enrichment Analysis

Submit top essential genes to Enrichr for pathway enrichment:
- KEGG pathways
- GO Biological Process
- Retrieve enriched terms with p-values and gene lists

### Phase 7: Drug Target Prioritization

Composite scoring combining:
- **Essentiality** (50% weight): Normalized mean LFC from CRISPR screen
- **Expression** (30% weight): Log2 fold change from RNA-seq (if available)
- **Druggability** (20% weight): Number of drug interactions from DGIdb

Query DGIdb for each candidate gene to find existing drugs, interaction types, and sources.

### Phase 8: Report Generation

Generate markdown report with:
- Summary statistics (total genes, essential genes, non-essential genes)
- Top 20 essential genes table (rank, gene, mean LFC, sgRNAs, score)
- Pathway enrichment results (top 10 terms per database)
- Drug target candidates (rank, gene, essentiality, expression FC, druggability, priority score)
- Methods section

---

## ToolUniverse Tool Integration

**Key Tools Used**:
- `PubMed_search_articles` - Literature search for gene essentiality and drug resistance
- `ReactomeAnalysis_pathway_enrichment` - Pathway enrichment (param: `identifiers` newline-separated, `page_size`)
- `enrichr_gene_enrichment_analysis` - Enrichr enrichment (param: `gene_list` array, `libs` array)
- `DGIdb_get_drug_gene_interactions` - Drug-gene interactions (param: `genes` as array)
- `DGIdb_get_gene_druggability` - Druggability categories
- `STRING_get_network` - Protein interaction networks
- `kegg_search_pathway` - Pathway search by keyword
- `kegg_get_pathway_info` - Pathway details by ID

**Cancer Context** (essential for drug resistance screens):
- `civic_search_evidence_items` - Clinical evidence for drug resistance/sensitivity
- `COSMIC_get_mutations_by_gene` - Somatic mutation landscape
- `cBioPortal_get_mutations` - Mutations in specific cancer cohorts
- `ChEMBL_search_targets` - Structural druggability assessment

**Expression & Variant Integration**:
- `GEO_search_rnaseq_datasets` / `geo_search_datasets` - Expression datasets
- `ClinVar_search_variants` - Known pathogenic variants
- `gnomad_get_gene_constraints` - Gene constraint metrics (pLI, oe_lof)
- `UniProt_get_function_by_accession` - Protein function for hit validation

---

## Quick Start

```python
import pandas as pd
from tooluniverse import ToolUniverse

# 1. Load data
counts, meta = load_sgrna_counts("sgrna_counts.txt")
design = create_design_matrix(['T0_1', 'T0_2', 'T14_1', 'T14_2'],
                               ['baseline', 'baseline', 'treatment', 'treatment'])

# 2. Process
filtered_counts, filtered_mapping = filter_low_count_sgrna
setup-tooluniverseSkill

Install and configure ToolUniverse for any use case — MCP server (chat-based), CLI (command line with 9 subcommands), or Python SDK (Coding API with 3 calling patterns). Covers uv/uvx setup, MCP configuration for 12+ AI clients (Cursor, Claude Desktop, Windsurf, VS Code, Codex, Gemini CLI, Trae, Cline, etc.), full CLI reference (tu list/grep/find/info/run/test/status/build/serve), Coding API quickstart, agentic tools, code executor, API key walkthrough, skill installation, and upgrading. Use when user asks how to set up ToolUniverse, which access mode to use (MCP vs CLI vs SDK), configuring MCP servers, using the CLI, troubleshooting installation, upgrading, or mentions installing ToolUniverse or setting up scientific tools. Also triggers for "how do I use ToolUniverse", "what's the best way to access tools", "command line", "tu command", "coding API", "tu build".

tooluniverse-acmg-variant-classificationSkill

Systematic ACMG/AMP germline variant classification with all 28 criteria (PVS1, PS1-4, PM1-6, PP1-5, BA1, BS1-4, BP1-7) for clinical significance. Produces 5-tier verdict (Pathogenic / Likely Pathogenic / VUS / Likely Benign / Benign) with cited evidence per criterion. Use for variant interpretation, VUS resolution, and pathogenicity assessment. Combines ClinVar, gnomAD, computational predictors, and gene-mechanism context.

tooluniverse-admet-predictionSkill

Comprehensive ADMET (Absorption, Distribution, Metabolism, Excretion, Toxicity) profiling for drug candidates. Integrates ADMET-AI predictions, SwissADME drug-likeness, PubChemTox experimental toxicity, ChEMBL clinical data, Lipinski rule-of-five, and CYP interaction data. Use for drug-likeness assessment, BBB penetration, bioavailability, hepatotoxicity prediction, ADME/PK profiling, or screening compound libraries before lab testing.

tooluniverse-adverse-event-detectionSkill

Detect and analyze adverse drug event signals using FDA FAERS reports, drug labels, and disproportionality statistics (PRR, ROR, IC). Generates quantitative safety signal scores (0-100) with evidence grading. Use for post-market surveillance, pharmacovigilance, drug safety assessment, regulatory submissions, and detecting rare AE signals not visible in clinical trials.

tooluniverse-adverse-outcome-pathwaySkill

Map environmental and industrial chemicals to adverse outcome pathways (AOPs) — molecular initiating event to organ-level toxicity. Uses AOPWiki, GHS classification, IARC carcinogen status, and LD50 data. Use for environmental/industrial chemical risk assessment, regulatory-grade hazard characterization, and AOP stressor mapping. Distinct from drug-safety analysis (use tooluniverse-pharmacovigilance for drugs).

tooluniverse-aging-senescenceSkill

Aging biology, cellular senescence, and longevity research. Covers senescence markers (p16/CDKN2A, SASP, SA-beta-gal), aging hallmarks, senolytic drug discovery (dasatinib+quercetin, fisetin, navitoclax), epigenetic clocks, telomere biology, and longevity GWAS. Use for senescence-pathway analysis, age-related disease genetics, senolytic-target discovery, and centenarian-genetics queries. Distinguishes correlative vs causal evidence (knockout, intervention).

tooluniverse-antibody-engineeringSkill

Therapeutic antibody engineering and optimization, lead-to-clinical-candidate. Covers sequence humanization (germline alignment, framework retention), affinity maturation, developability (aggregation, stability, PTMs), structure modeling (AlphaFold/PDB CDR analysis), immunogenicity prediction, and manufacturing feasibility. Use for biologic-drug optimization, mAb design review, biosimilar engineering, and clinical-precedent comparison.

tooluniverse-binder-discoverySkill

Discover novel small-molecule binders for protein targets using structure-based and ligand-based screening. Covers druggability assessment, known-ligand mining (ChEMBL, BindingDB), similarity expansion, ADMET filtering, and synthesis feasibility. Use for hit identification, virtual screening, target-to-compounds workflows, and lead-finding before commit-to-medchem.