Skill1.6k repo starsupdated today

tooluniverse-image-analysis

This Claude Code skill analyzes quantitative microscopy data such as colony morphometry, cell counts, and fluorescence intensity measurements from imaging software outputs like CellProfiler and ImageJ. Use it to perform statistical comparisons (ANOVA, t-tests, Dunnett's test), calculate dose-response curves, and compute image-derived measurement statistics using pandas, numpy, scipy, and scikit-image on tabular imaging data.

View source Repository: ToolUniverse

Install in Claude Code

Copy

git clone --depth 1 https://github.com/mims-harvard/ToolUniverse /tmp/tooluniverse-image-analysis && cp -r /tmp/tooluniverse-image-analysis/plugin/skills/tooluniverse-image-analysis ~/.claude/skills/tooluniverse-image-analysis

Then start a new Claude Code session; the skill loads automatically.

Definition

SKILL.md

# Microscopy Image Analysis and Quantitative Imaging Data

## RULE ZERO — Check for pre-computed results FIRST

Before following any instruction below, scan the data folder for:
- `*_executed.ipynb` → read with `tu run read_executed_notebook '{"data_folder":"<path>","search":"<keyword>"}'` and cite its cell outputs as the authoritative answer
- Pre-computed result files (CSV/TSV with names like `*results*`, `*deseq*`, `*enrich*`, `*stats*`, `*_simplified.csv`) → read directly and report the requested value
- Canonical analysis scripts (`analysis.R`, `run_*.py`, `find_*.R`, `*.Rmd`) → execute as-is and read the output

Only follow this skill's re-analysis recipe below if **none** of the above exist. Re-running from raw data produces different numbers than the published answer and is much slower (often 5-10× turn count).

---

## CRITICAL — "Relative proportion of A to B" defaults to PERCENTAGE

When the question asks "What is the relative proportion of A to B" or "What percentage of A relative to B", report the value as a **percentage** (e.g., `29` for ratio 0.29), NOT a decimal ratio. Biology assay GTs use whole-number percentage ranges like `(25,30)`, not `(0.25,0.30)`. Multiply your computed ratio by 100 before reporting:

```python
ratio = mean_A / mean_B           # e.g., 0.29
percentage = ratio * 100          # e.g., 29
print(f"{percentage:.1f}%")       # "29.0%"  ← THIS is the answer
```

Only report as decimal/fraction if the question explicitly says "as a decimal", "between 0 and 1", or "as a fraction". Common error: reporting `0.29` when the GT range is `(25,30)` — graded as wrong even though the underlying ratio is correct.

---

Production-ready skill for analyzing microscopy-derived measurement data using pandas, numpy, scipy, statsmodels, and scikit-image.

## LOOK UP, DON'T GUESS
When uncertain about any scientific fact, SEARCH databases first rather than reasoning from memory.

---

## When to Use

- Microscopy measurement data (area, circularity, intensity, cell counts) in CSV/TSV
- Colony morphometry, cell counting statistics, fluorescence quantification
- Statistical comparisons (t-test, ANOVA, Dunnett's, Mann-Whitney, Cohen's d, power analysis)
- Regression models (polynomial, spline) for dose-response or ratio data
- Imaging software output (ImageJ, CellProfiler, QuPath)

**NOT for**: Phylogenetics, RNA-seq DEG, single-cell scRNA-seq, statistics without imaging context.

---

## Core Principles

1. **Data-first** - Load and inspect all CSV/TSV before analysis
2. **Question-driven** - Parse the exact statistic requested
3. **Statistical rigor** - Effect sizes, multiple comparison corrections, model selection
4. **Imaging-aware** - Understand ImageJ/CellProfiler columns (Area, Circularity, Round, Intensity)
5. **Precision** - Match expected answer format (integer, range, decimal places)

---

## Required Packages

```python
import pandas as pd, numpy as np
from scipy import stats
from scipy.interpolate import BSpline, make_interp_spline
import statsmodels.api as sm
from statsmodels.formula.api import ols
from statsmodels.stats.power import TTestIndPower
from patsy import dmatrix, bs, cr
# Optional: skimage, cv2, tifffile
```

---

## Workflow Decision Tree

```
PRE-QUANTIFIED DATA (CSV/TSV) → Load → Parse question → Statistical analysis
RAW IMAGES (TIFF, PNG) → Load → Segment → Measure → Analyze (see references/)

Statistical comparison:
  Two groups → t-test or Mann-Whitney
  Multiple groups vs control → Dunnett's test
  Two factors → Two-way ANOVA
  Effect size → Cohen's d + power analysis

Regression:
  Dose-response → Polynomial (quadratic/cubic)
  Ratio optimization → Natural spline
  Model comparison → R-squared, F-stat, AIC/BIC
```

---

## Analysis Workflow

### Phase 0: Question Parsing and Data Discovery

```python
import os, glob, pandas as pd
csv_files = glob.glob(os.path.join(".", '**', '*.csv'), recursive=True)
df = pd.read_csv(csv_files[0])
print(f"Shape: {df.shape}, Columns: {list(df.columns)}")
```

Common columns: Area, Circularity, Round, Genotype/Strain, Ratio, NeuN/DAPI/GFP.

### Phase 1-3: Grouped Stats → Statistical Testing → Regression

See **references/statistical_analysis.md** for complete implementations of grouped_summary, Dunnett's, Cohen's d, power analysis, polynomial/spline regression.

---

## Common Patterns

| Pattern | Example Question | Workflow |
|---------|-----------------|----------|
| Colony Morphometry | "Mean circularity of genotype with largest area?" | Group by Genotype → max mean Area → report Circularity |
| Cell Counting | "Cohen's d for NeuN counts?" | Filter → split by Condition → pooled SD → Cohen's d |
| Multi-Group Comparison | "How many ratios equivalent to control?" | Dunnett's for Area AND Circularity → count non-significant in BOTH |
| Regression | "Peak frequency from natural spline?" | Ratio→frequency → spline(df=4) → grid search peak → CI |

---

## Raw Image Processing

```python
from scripts.segment_cells import count_cells_in_image
result = count_cells_in_image(image_path="cells.tif", channel=0, min_area=50)
```

Segmentation: Nuclei → Otsu+watershed; Colonies → Otsu; Phase contrast → adaptive threshold.
See **references/segmentation.md**, **references/cell_counting.md**, **references/image_processing.md**.

---

## R-to-Python Equivalents

- R Dunnett (`multcomp::glht`) → `scipy.stats.dunnett()` (scipy >= 1.10)
- R natural spline (`ns(x, df=4)`) → `patsy.cr(x, knots=...)` with explicit quantile knots
- R `t.test()` → `scipy.stats.ttest_ind()`
- R `aov()` → `statsmodels.formula.api.ols()` + `sm.stats.anova_lm()`

## Answer Formatting

- "to the nearest thousand": `int(round(val, -3))`
- Cohen's d: 3 decimal places
- Sample sizes: integer (ceiling)
- Ratios: string "5:1"

### "Relative proportion of A to B" — default to PERCENTAGE

Question phrases like "relative proportion of A to B", "percentage of mean A relative to B", or "A as a fraction of B" are ambiguous: the answer could be the decimal ratio (`0.29

More from this repository

setup-tooluniverseSkill

Install and configure ToolUniverse for any use case — MCP server (chat-based), CLI (command line with 9 subcommands), or Python SDK (Coding API with 3 calling patterns). Covers uv/uvx setup, MCP configuration for 12+ AI clients (Cursor, Claude Desktop, Windsurf, VS Code, Codex, Gemini CLI, Trae, Cline, etc.), full CLI reference (tu list/grep/find/info/run/test/status/build/serve), Coding API quickstart, agentic tools, code executor, API key walkthrough, skill installation, and upgrading. Use when user asks how to set up ToolUniverse, which access mode to use (MCP vs CLI vs SDK), configuring MCP servers, using the CLI, troubleshooting installation, upgrading, or mentions installing ToolUniverse or setting up scientific tools. Also triggers for "how do I use ToolUniverse", "what's the best way to access tools", "command line", "tu command", "coding API", "tu build".

tooluniverse-acmg-variant-classificationSkill

Systematic ACMG/AMP germline variant classification with all 28 criteria (PVS1, PS1-4, PM1-6, PP1-5, BA1, BS1-4, BP1-7) for clinical significance. Produces 5-tier verdict (Pathogenic / Likely Pathogenic / VUS / Likely Benign / Benign) with cited evidence per criterion. Use for variant interpretation, VUS resolution, and pathogenicity assessment. Combines ClinVar, gnomAD, computational predictors, and gene-mechanism context.

tooluniverse-admet-predictionSkill

Comprehensive ADMET (Absorption, Distribution, Metabolism, Excretion, Toxicity) profiling for drug candidates. Integrates ADMET-AI predictions, SwissADME drug-likeness, PubChemTox experimental toxicity, ChEMBL clinical data, Lipinski rule-of-five, and CYP interaction data. Use for drug-likeness assessment, BBB penetration, bioavailability, hepatotoxicity prediction, ADME/PK profiling, or screening compound libraries before lab testing.

tooluniverse-adverse-event-detectionSkill

Detect and analyze adverse drug event signals using FDA FAERS reports, drug labels, and disproportionality statistics (PRR, ROR, IC). Generates quantitative safety signal scores (0-100) with evidence grading. Use for post-market surveillance, pharmacovigilance, drug safety assessment, regulatory submissions, and detecting rare AE signals not visible in clinical trials.

tooluniverse-adverse-outcome-pathwaySkill

Map environmental and industrial chemicals to adverse outcome pathways (AOPs) — molecular initiating event to organ-level toxicity. Uses AOPWiki, GHS classification, IARC carcinogen status, and LD50 data. Use for environmental/industrial chemical risk assessment, regulatory-grade hazard characterization, and AOP stressor mapping. Distinct from drug-safety analysis (use tooluniverse-pharmacovigilance for drugs).

tooluniverse-aging-senescenceSkill

Aging biology, cellular senescence, and longevity research. Covers senescence markers (p16/CDKN2A, SASP, SA-beta-gal), aging hallmarks, senolytic drug discovery (dasatinib+quercetin, fisetin, navitoclax), epigenetic clocks, telomere biology, and longevity GWAS. Use for senescence-pathway analysis, age-related disease genetics, senolytic-target discovery, and centenarian-genetics queries. Distinguishes correlative vs causal evidence (knockout, intervention).

tooluniverse-antibody-engineeringSkill

Therapeutic antibody engineering and optimization, lead-to-clinical-candidate. Covers sequence humanization (germline alignment, framework retention), affinity maturation, developability (aggregation, stability, PTMs), structure modeling (AlphaFold/PDB CDR analysis), immunogenicity prediction, and manufacturing feasibility. Use for biologic-drug optimization, mAb design review, biosimilar engineering, and clinical-precedent comparison.

tooluniverse-binder-discoverySkill

Discover novel small-molecule binders for protein targets using structure-based and ligand-based screening. Covers druggability assessment, known-ligand mining (ChEMBL, BindingDB), similarity expansion, ADMET filtering, and synthesis feasibility. Use for hit identification, virtual screening, target-to-compounds workflows, and lead-finding before commit-to-medchem.