Skip to main content
ClaudeWave
Skill1.4k estrellas del repoactualizado today

tooluniverse-diagnostic-test-evaluation

This skill evaluates diagnostic test and biomarker performance across three scenarios: calculating sensitivity, specificity, PPV, NPV, and likelihood ratios from a 2x2 contingency table at a fixed threshold; generating ROC curves, AUC, and Youden-optimal cutoffs for continuous biomarkers; and computing post-test disease probability using Bayes' theorem. Use it when validating a test against a gold standard or deciding whether a biomarker adequately discriminates disease from health.

Instalar en Claude Code
Copiar
git clone --depth 1 https://github.com/mims-harvard/ToolUniverse /tmp/tooluniverse-diagnostic-test-evaluation && cp -r /tmp/tooluniverse-diagnostic-test-evaluation/plugin/skills/tooluniverse-diagnostic-test-evaluation ~/.claude/skills/tooluniverse-diagnostic-test-evaluation
Después abre una sesión nueva de Claude Code; el skill carga automáticamente.

SKILL.md

# Diagnostic Test / Biomarker Accuracy Evaluation

Judge how well a test or biomarker discriminates disease — at a fixed cutoff (2×2) or across all cutoffs (ROC) — and turn a result into a probability of disease.

## Which case are you in?

| You have… | Go to |
|---|---|
| A 2×2 table (TP/FP/TN/FN) at a fixed cutoff | **Step 1** (`Epidemiology_diagnostic`) |
| A **continuous** biomarker score + true labels | **Step 2** (ROC / AUC / Youden, Python) |
| A test's sens/spec + a patient's pre-test probability | **Step 3** (`Epidemiology_bayesian`) |

## Step 1 — Fixed-cutoff metrics from a 2×2 table

```bash
tu run Epidemiology_diagnostic '{"operation":"diagnostic","tp":90,"fp":10,"tn":180,"fn":20}'
```

Returns `sensitivity`, `specificity`, `PPV`, `NPV`, `accuracy`, `LR_pos`, `LR_neg`, and the sample `prevalence`.

| Metric | Question it answers | Depends on prevalence? |
|---|---|---|
| **Sensitivity** = TP/(TP+FN) | Of those WITH disease, what fraction test positive? | No |
| **Specificity** = TN/(TN+FP) | Of those WITHOUT disease, what fraction test negative? | No |
| **PPV** = TP/(TP+FP) | If positive, what's the chance of disease? | **Yes — strongly** |
| **NPV** = TN/(TN+FN) | If negative, what's the chance of being disease-free? | **Yes** |
| **LR+** = sens/(1−spec) | How much a positive raises the odds of disease | No |
| **LR−** = (1−sens)/spec | How much a negative lowers the odds | No |

> **The PPV/NPV trap.** Sensitivity and specificity are properties of the *test*; **PPV and NPV depend on the disease prevalence in the tested population.** A test with great sens/spec has poor PPV in a low-prevalence (screening) setting. Never quote PPV/NPV from a case-control design (its 50/50 prevalence is artificial) — compute them for the real-world prevalence with `Epidemiology_bayesian` (Step 3). Report **sensitivity, specificity, and likelihood ratios** as the prevalence-independent summary.

## Step 2 — ROC / AUC / optimal cutoff for a continuous biomarker

When the test is a continuous score, evaluate across **all** thresholds:

```bash
python skills/tooluniverse-diagnostic-test-evaluation/scripts/roc_analysis.py --input scores.csv
# scores.csv columns: label (1=disease, 0=healthy), score (continuous biomarker)
```

It reports AUC (with a bootstrap 95% CI), the **Youden-optimal cutoff** (max sensitivity+specificity−1) and its sens/spec, and a text ROC curve.

| AUC | Discrimination |
|---|---|
| 0.5 | no better than chance |
| 0.7–0.8 | acceptable |
| 0.8–0.9 | excellent |
| >0.9 | outstanding |

- The **Youden** cutoff weights sensitivity and specificity equally; if false negatives and false positives have different costs, pick the threshold from the clinical tradeoff, not Youden.
- Once you choose a cutoff, build its 2×2 and run Step 1 for the fixed-cutoff metrics at that operating point.

## Step 3 — Post-test probability (Bayes)

Turn a result into the probability of disease for a given pre-test probability/prevalence:

```bash
tu run Epidemiology_bayesian '{"operation":"bayesian","prevalence":0.10,
  "sensitivity":0.90,"specificity":0.95,"test_result":"positive"}'
```

Returns `pre_test_odds`, the `LR`, and `post_test_probability`. This is how you get the *real-world* PPV: plug the true prevalence in. (Example: a 90%/95% test at 10% prevalence gives a post-positive probability of only ~67%, not 95%.)

## Gotchas (state these)

- **PPV/NPV without a stated prevalence are meaningless** — always give the prevalence they assume.
- **AUC ignores the operating point.** A high AUC doesn't tell you the test is useful at the threshold you'll actually use — report sens/spec at the chosen cutoff too.
- **Class imbalance.** With very few positives, ROC/AUC can look good while PPV is poor; consider a precision-recall curve and always report PPV at the real prevalence.
- **Spectrum bias.** Sens/spec measured on clearly-sick vs clearly-healthy subjects overestimate real-world performance on borderline cases.
- **Single cutoff chosen on the same data** it's evaluated on is optimistic — validate the threshold on a held-out set.

## Honest limitations

- These are discrimination/accuracy metrics, not calibration — a well-discriminating model can still output poorly-calibrated probabilities.
- A single AUC compares nothing; to compare two tests on the same patients, use a paired AUC test (DeLong) — beyond the basic script here.

## Related skills
- `tooluniverse-statistical-modeling` — logistic regression that produces the score, ORs.
- `tooluniverse-epidemiological-analysis` — population-level risk, screening program metrics.
- `tooluniverse-meta-analysis` — pool diagnostic accuracy across studies.
setup-tooluniverseSkill

Install and configure ToolUniverse for any use case — MCP server (chat-based), CLI (command line with 9 subcommands), or Python SDK (Coding API with 3 calling patterns). Covers uv/uvx setup, MCP configuration for 12+ AI clients (Cursor, Claude Desktop, Windsurf, VS Code, Codex, Gemini CLI, Trae, Cline, etc.), full CLI reference (tu list/grep/find/info/run/test/status/build/serve), Coding API quickstart, agentic tools, code executor, API key walkthrough, skill installation, and upgrading. Use when user asks how to set up ToolUniverse, which access mode to use (MCP vs CLI vs SDK), configuring MCP servers, using the CLI, troubleshooting installation, upgrading, or mentions installing ToolUniverse or setting up scientific tools. Also triggers for "how do I use ToolUniverse", "what's the best way to access tools", "command line", "tu command", "coding API", "tu build".

tooluniverse-acmg-variant-classificationSkill

Systematic ACMG/AMP germline variant classification with all 28 criteria (PVS1, PS1-4, PM1-6, PP1-5, BA1, BS1-4, BP1-7) for clinical significance. Produces 5-tier verdict (Pathogenic / Likely Pathogenic / VUS / Likely Benign / Benign) with cited evidence per criterion. Use for variant interpretation, VUS resolution, and pathogenicity assessment. Combines ClinVar, gnomAD, computational predictors, and gene-mechanism context.

tooluniverse-admet-predictionSkill

Comprehensive ADMET (Absorption, Distribution, Metabolism, Excretion, Toxicity) profiling for drug candidates. Integrates ADMET-AI predictions, SwissADME drug-likeness, PubChemTox experimental toxicity, ChEMBL clinical data, Lipinski rule-of-five, and CYP interaction data. Use for drug-likeness assessment, BBB penetration, bioavailability, hepatotoxicity prediction, ADME/PK profiling, or screening compound libraries before lab testing.

tooluniverse-adverse-event-detectionSkill

Detect and analyze adverse drug event signals using FDA FAERS reports, drug labels, and disproportionality statistics (PRR, ROR, IC). Generates quantitative safety signal scores (0-100) with evidence grading. Use for post-market surveillance, pharmacovigilance, drug safety assessment, regulatory submissions, and detecting rare AE signals not visible in clinical trials.

tooluniverse-adverse-outcome-pathwaySkill

Map environmental and industrial chemicals to adverse outcome pathways (AOPs) — molecular initiating event to organ-level toxicity. Uses AOPWiki, GHS classification, IARC carcinogen status, and LD50 data. Use for environmental/industrial chemical risk assessment, regulatory-grade hazard characterization, and AOP stressor mapping. Distinct from drug-safety analysis (use tooluniverse-pharmacovigilance for drugs).

tooluniverse-aging-senescenceSkill

Aging biology, cellular senescence, and longevity research. Covers senescence markers (p16/CDKN2A, SASP, SA-beta-gal), aging hallmarks, senolytic drug discovery (dasatinib+quercetin, fisetin, navitoclax), epigenetic clocks, telomere biology, and longevity GWAS. Use for senescence-pathway analysis, age-related disease genetics, senolytic-target discovery, and centenarian-genetics queries. Distinguishes correlative vs causal evidence (knockout, intervention).

tooluniverse-antibody-engineeringSkill

Therapeutic antibody engineering and optimization, lead-to-clinical-candidate. Covers sequence humanization (germline alignment, framework retention), affinity maturation, developability (aggregation, stability, PTMs), structure modeling (AlphaFold/PDB CDR analysis), immunogenicity prediction, and manufacturing feasibility. Use for biologic-drug optimization, mAb design review, biosimilar engineering, and clinical-precedent comparison.

tooluniverse-binder-discoverySkill

Discover novel small-molecule binders for protein targets using structure-based and ligand-based screening. Covers druggability assessment, known-ligand mining (ChEMBL, BindingDB), similarity expansion, ADMET filtering, and synthesis feasibility. Use for hit identification, virtual screening, target-to-compounds workflows, and lead-finding before commit-to-medchem.