tooluniverse-clinical-data-integration
Clinical Data Integration for Drug Safety combines FDA labeling, FAERS adverse event reports, statistical disproportionality measures, pharmacogenomic biomarkers, clinical trial findings, and published literature into a unified safety assessment. Use this skill for regulatory drug safety reviews, comprehensive pharmacovigilance reports comparing label-documented versus real-world adverse events, and clinical decision support requiring multi-source safety evidence integration.
git clone --depth 1 https://github.com/mims-harvard/ToolUniverse /tmp/tooluniverse-clinical-data-integration && cp -r /tmp/tooluniverse-clinical-data-integration/plugin/skills/tooluniverse-clinical-data-integration ~/.claude/skills/tooluniverse-clinical-data-integrationSKILL.md
# Clinical Data Integration for Drug Safety
End-to-end drug safety review pipeline that integrates FDA label information, FAERS spontaneous reports, disproportionality signal detection, pharmacogenomic biomarkers, clinical trial data, and published literature. Designed for regulatory assessments, pharmacovigilance, and clinical decision support.
**Guiding principles**:
1. **Label is ground truth** -- FDA-approved labeling is the authoritative starting point for known safety information
2. **Signals need context** -- a FAERS signal without label or literature corroboration is hypothesis-generating, not confirmatory
3. **Disproportionality is not causation** -- PRR/ROR measure reporting patterns, not causal relationships
4. **Pharmacogenomics narrows risk** -- PGx biomarkers can identify which patients face elevated risk
5. **Progressive reporting** -- create the report file early; update section by section
6. **English-first queries** -- use English drug names in all tool calls; respond in the user's language
Clinical data integration starts with data harmonization. Different hospitals code the same diagnosis differently (ICD-10 vs SNOMED). Before merging datasets, verify the coding system. Missing data is informative — a missing lab value may mean the test wasn't ordered (patient was stable) not that the result was normal.
## LOOK UP, DON'T GUESS
When uncertain about any scientific fact, SEARCH databases first rather than reasoning from memory. A database-verified answer is always more reliable than a guess.
**Differentiation**: This skill emphasizes *regulatory-grade data integration* across the full drug lifecycle. For focused FAERS signal detection with quantitative scoring, see `tooluniverse-adverse-event-detection`. For general pharmacovigilance workflows, see `tooluniverse-pharmacovigilance`.
---
## COMPUTE, DON'T DESCRIBE
When analysis requires computation (statistics, data processing, scoring, enrichment), write and run Python code via Bash. Don't describe what you would do — execute it and report actual results. Use ToolUniverse tools to retrieve data, then Python (pandas, scipy, statsmodels, matplotlib) to analyze it.
## When to Use
Typical triggers:
- "Give me a full safety review for [drug]"
- "What does the FDA label say about [drug] and [event]?"
- "Are there FAERS signals for [drug]?"
- "What pharmacogenomic biomarkers exist for [drug]?"
- "Find clinical trials studying [drug] safety"
- "Post-market surveillance summary for [drug]"
- "Compare safety profiles of [drug A] and [drug B]"
---
## Core Data Sources
| Source | Type | Best For |
|--------|------|----------|
| **FDA Labels (DailyMed)** | Regulatory | Approved safety information, boxed warnings, drug interactions |
| **FAERS** | Spontaneous reports | Post-market adverse event signals, demographic patterns |
| **CPIC** | Guidelines | Pharmacogenomic dosing recommendations |
| **FDA PGx Biomarkers** | Regulatory | Approved pharmacogenomic labeling |
| **ClinicalTrials.gov** | Trial registry | Ongoing/completed safety trials |
| **PubMed** | Literature | Published safety studies, case reports |
---
## Workflow Overview
```
Phase 0: Drug Identity & Context
Resolve drug name, get class, mechanism, indications
|
Phase 1: FDA Label Extraction
Boxed warnings, contraindications, adverse reactions, interactions
|
Phase 2: FAERS Signal Detection
Top adverse events, disproportionality (PRR/ROR), demographics
|
Phase 3: Pharmacogenomics
CPIC guidelines, FDA PGx biomarkers, genotype-specific risks
|
Phase 4: Clinical Trials
Safety-focused trials, risk evaluation programs
|
Phase 5: Literature Evidence
PubMed safety studies, case reports, meta-analyses
|
Phase 6: Integrated Safety Report
Synthesize all sources into a cohesive safety profile
```
---
## Phase Details
### Phase 0: Drug Identity & Context
**Objective**: Unambiguously identify the drug and establish baseline context.
**Tools**:
- `DailyMed_search_spls` -- search Structured Product Labels
- Input: `query` (drug name)
- Output: SPL list with set IDs, titles, labeler names
- `OpenFDA_get_approval_history` -- get approval dates and supplements
- Input: `drug_name` (generic or brand name)
- Output: approval dates, application numbers, supplement history
**Workflow**:
1. Search DailyMed to confirm the drug name and identify the correct SPL
2. Get approval history to establish how long the drug has been marketed
3. Note the therapeutic class, mechanism of action, and approved indications
4. Record brand names vs generic name for consistent FAERS queries
**Tip**: FAERS uses `medicinalproduct` which can be brand or generic. Try both forms in Phase 2.
### Phase 1: FDA Label Extraction
**Objective**: Extract all safety-relevant sections from the FDA-approved label.
**Tools**:
- `FDA_get_boxed_warning_info_by_drug_name` -- boxed (black box) warnings
- Input: `drug_name`
- Output: warning text, or `{error: {code: "NOT_FOUND"}}` if none exists (normal)
- `FDA_get_warnings_and_cautions_by_drug_name` -- warnings and precautions section
- Input: `drug_name`
- Output: full warnings text
- `DailyMed_parse_adverse_reactions` -- adverse reactions from label
- Input: `setid` (NOT `set_id`; from Phase 0 DailyMed search)
- Output: parsed adverse reaction tables and text
- `DailyMed_parse_drug_interactions` -- drug interaction section
- Input: `setid` (NOT `set_id`)
- Output: parsed interaction data
**Workflow**:
1. Check for boxed warnings first -- these represent the most serious safety concerns
2. Extract warnings and precautions
3. Parse adverse reactions (both clinical trial rates and post-marketing reports)
4. Extract drug interactions
5. A `NOT_FOUND` response for boxed warnings is normal and means no boxed warning exists
**Label section priority**: Boxed Warning > Contraindications > Warnings/Precautions > Adverse Reactions > Drug Interactions
### Phase 2: FAERS Signal Detection
**Objective**: IdInstall and configure ToolUniverse for any use case — MCP server (chat-based), CLI (command line with 9 subcommands), or Python SDK (Coding API with 3 calling patterns). Covers uv/uvx setup, MCP configuration for 12+ AI clients (Cursor, Claude Desktop, Windsurf, VS Code, Codex, Gemini CLI, Trae, Cline, etc.), full CLI reference (tu list/grep/find/info/run/test/status/build/serve), Coding API quickstart, agentic tools, code executor, API key walkthrough, skill installation, and upgrading. Use when user asks how to set up ToolUniverse, which access mode to use (MCP vs CLI vs SDK), configuring MCP servers, using the CLI, troubleshooting installation, upgrading, or mentions installing ToolUniverse or setting up scientific tools. Also triggers for "how do I use ToolUniverse", "what's the best way to access tools", "command line", "tu command", "coding API", "tu build".
Systematic ACMG/AMP germline variant classification with all 28 criteria (PVS1, PS1-4, PM1-6, PP1-5, BA1, BS1-4, BP1-7) for clinical significance. Produces 5-tier verdict (Pathogenic / Likely Pathogenic / VUS / Likely Benign / Benign) with cited evidence per criterion. Use for variant interpretation, VUS resolution, and pathogenicity assessment. Combines ClinVar, gnomAD, computational predictors, and gene-mechanism context.
Comprehensive ADMET (Absorption, Distribution, Metabolism, Excretion, Toxicity) profiling for drug candidates. Integrates ADMET-AI predictions, SwissADME drug-likeness, PubChemTox experimental toxicity, ChEMBL clinical data, Lipinski rule-of-five, and CYP interaction data. Use for drug-likeness assessment, BBB penetration, bioavailability, hepatotoxicity prediction, ADME/PK profiling, or screening compound libraries before lab testing.
Detect and analyze adverse drug event signals using FDA FAERS reports, drug labels, and disproportionality statistics (PRR, ROR, IC). Generates quantitative safety signal scores (0-100) with evidence grading. Use for post-market surveillance, pharmacovigilance, drug safety assessment, regulatory submissions, and detecting rare AE signals not visible in clinical trials.
Map environmental and industrial chemicals to adverse outcome pathways (AOPs) — molecular initiating event to organ-level toxicity. Uses AOPWiki, GHS classification, IARC carcinogen status, and LD50 data. Use for environmental/industrial chemical risk assessment, regulatory-grade hazard characterization, and AOP stressor mapping. Distinct from drug-safety analysis (use tooluniverse-pharmacovigilance for drugs).
Aging biology, cellular senescence, and longevity research. Covers senescence markers (p16/CDKN2A, SASP, SA-beta-gal), aging hallmarks, senolytic drug discovery (dasatinib+quercetin, fisetin, navitoclax), epigenetic clocks, telomere biology, and longevity GWAS. Use for senescence-pathway analysis, age-related disease genetics, senolytic-target discovery, and centenarian-genetics queries. Distinguishes correlative vs causal evidence (knockout, intervention).
Therapeutic antibody engineering and optimization, lead-to-clinical-candidate. Covers sequence humanization (germline alignment, framework retention), affinity maturation, developability (aggregation, stability, PTMs), structure modeling (AlphaFold/PDB CDR analysis), immunogenicity prediction, and manufacturing feasibility. Use for biologic-drug optimization, mAb design review, biosimilar engineering, and clinical-precedent comparison.
Discover novel small-molecule binders for protein targets using structure-based and ligand-based screening. Covers druggability assessment, known-ligand mining (ChEMBL, BindingDB), similarity expansion, ADMET filtering, and synthesis feasibility. Use for hit identification, virtual screening, target-to-compounds workflows, and lead-finding before commit-to-medchem.