mouse-phenome-database
Retrieve mouse phenotype data from the Jackson Laboratory Mouse Phenome Database (MPD) via its REST API. Browse 520+ projects, look up per-project measure metadata, pull strain-level means (raw or LS-mean adjusted) and per-animal values, find measures by MP/VT ontology terms, and resolve strain nomenclature or gene coordinates. Use for QTL support, cross-strain comparison, mouse model selection, and ontology-driven phenotype discovery. Use monarch-database for disease-gene-phenotype knowledge graphs; ensembl-database for mouse genome annotations.
git clone --depth 1 https://github.com/jaechang-hits/SciAgent-Skills /tmp/mouse-phenome-database && cp -r /tmp/mouse-phenome-database/skills/genomics-bioinformatics/databases/mouse-phenome-database ~/.claude/skills/mouse-phenome-databaseSKILL.md
# mouse-phenome-database
## Overview
The Mouse Phenome Database (MPD), maintained at the Jackson Laboratory, catalogs standardized phenotype measurements across inbred, recombinant inbred (e.g., BXD), and Collaborative Cross / Diversity Outbred mouse panels. It aggregates 520+ projects spanning metabolic, cardiovascular, behavioral, hematological, and immunological traits. The REST API at `https://phenome.jax.org/api` is free, requires no authentication, and is documented at <https://phenome.jax.org/about/api>. MPD measurement IDs (`measnum`) are project-scoped 5-digit integers — there is no global "measnum 10001 = body weight" mapping; valid measnums must be discovered per project via the `measureinfo` endpoint.
## When to Use
- Selecting inbred strains with extreme phenotypes (highest/lowest fasted glucose, body weight, heart rate, etc.) as experimental models
- Pulling individual-animal data from BXD / CC / DO panels for QTL mapping with R/qtl2 or similar tools
- Comparing strain means and variance across metabolic, behavioral, or cardiovascular measures for genetic background studies
- Finding MPD projects that measure a trait of interest using ontology terms (MP, VT, MA) or free-text descriptions
- Validating mouse strain nomenclature (canonical JAX names ↔ stock numbers ↔ MGI IDs) before submitting orders or analyses
- Looking up coordinates and annotations for mouse genes in the MPD/MGI cross-reference
- Use `monarch-database` instead for disease-gene-phenotype knowledge graphs (HPO ↔ MP ↔ disease)
- Use `ensembl-database` instead for transcript-level mouse gene annotations and variant consequence prediction
## Prerequisites
- **Python packages**: `requests`, `pandas`, `matplotlib`
- **Data requirements**: a project symbol (e.g., `Jaxwest1`, `Auwerx1`) or a measnum (e.g., `15101`); strain names follow JAX canonical nomenclature (e.g., `C57BL/6J`, `DBA/2J`)
- **Environment**: internet connection; no API key required
- **Rate limits**: no published hard limit; keep bursts under ~5 requests/second and add `time.sleep(0.3)` between requests in loops
```bash
pip install requests pandas matplotlib
```
## Quick Start
```python
import requests
MPD = "https://phenome.jax.org/api"
# 1) Pick a project (Jaxwest1 — cardiovascular phenotyping on inbred panel)
r = requests.get(f"{MPD}/projects/Jaxwest1/strains", timeout=30)
strains = r.json()["strains"]
print(f"Jaxwest1: {len(strains)} strains tested")
# 2) Discover its measures
r = requests.get(f"{MPD}/pheno/measureinfo/Jaxwest1", timeout=30)
measures = r.json()["measures_info"]
print(f"Jaxwest1 measures: {len(measures)}; first: measnum={measures[0]['measnum']} "
f"varname={measures[0]['varname']} ({measures[0]['descrip']}, {measures[0]['units']})")
# 3) Pull strain means for heart rate (varname=HR, measnum=15101)
r = requests.get(f"{MPD}/pheno/strainmeans/15101", timeout=30)
sm = r.json()["strainmeans"]
print(f"\nHeart rate strain means: {len(sm)} rows (one per strain × sex)")
top = sorted(sm, key=lambda x: x["mean"], reverse=True)[:5]
for s in top:
print(f" {s['strain']:<20} sex={s['sex']} mean={s['mean']:.0f} {s.get('varname','')} n={s['nmice']}")
```
## Core API
### Module 1: Browse Projects — `/projects`
Lists all MPD projects with full metadata. Filter via `investigator`, `projsym`, `projid`, `mpdsector`, `largecollab`, `panelsym`. Use `/project_filters/{filtername}` to see the allowed values of `mpdsector`, `largecollab`, or `panelsym` before filtering.
```python
import requests, pandas as pd
MPD = "https://phenome.jax.org/api"
# List allowed panel symbols (e.g., BXD, CC, DO)
filters = requests.get(f"{MPD}/project_filters/panelsym", timeout=30).json()
print(f"Available panels ({filters['count']}):", [t['term'] for t in filters['terms']][:10])
# All projects in the BXD recombinant inbred panel
r = requests.get(f"{MPD}/projects", params={"panelsym": "BXD"}, timeout=30)
projects = r.json()["projects"]
print(f"BXD projects: {len(projects)}")
df = pd.DataFrame([{
"projsym": p["projsym"],
"pi": p.get("pistring", "")[:40],
"nstrains": p.get("nstrains"),
"ages": p.get("ages"),
"sector": p.get("mpdsector"),
"title": (p.get("title") or "")[:60],
} for p in projects])
print(df.head(10).to_string(index=False))
```
```python
# Filter by MPD sector — komp, pheno, qtla, snp, onestrain, phenoarchive
r = requests.get(f"{MPD}/projects", params={"mpdsector": "qtla"}, timeout=30)
qtl_projects = r.json()["projects"]
print(f"QTL-archive projects: {len(qtl_projects)}")
for p in qtl_projects[:5]:
print(f" {p['projsym']:<15} panel={p.get('panelsym') or '--':<6} nstrains={str(p.get('nstrains') or '--'):>4} {(p.get('title') or '')[:55]}")
```
### Module 2: Project Detail — `/projects/{projsym}/...`
Each project has sub-resources for its dataset (CSV of every animal × every measure), the strain panel it tested, the publications it produced, and (for QTL projects) the genetic markers used.
```python
import requests, io, pandas as pd
MPD = "https://phenome.jax.org/api"
# Full per-animal dataset as CSV (default). Use json=yes for JSON.
r = requests.get(f"{MPD}/projects/Jaxwest1/dataset", timeout=60)
df = pd.read_csv(io.StringIO(r.text))
print(f"Jaxwest1 dataset: {df.shape[0]} animals × {df.shape[1]} columns")
print(df.columns[:12].tolist())
print(df[["strain", "sex", "animal_id", "HR", "QRS", "bw"]].head(5).to_string(index=False))
```
```python
# Strains tested in a project + publication list
strains = requests.get(f"{MPD}/projects/Jaxwest1/strains", timeout=30).json()
print(f"Jaxwest1 strains ({strains['count']}):")
for s in strains["strains"][:5]:
print(f" {s['strainname']:<20} stock={s['stocknum']} vendor={s['vendor']}")
pubs = requests.get(f"{MPD}/projects/Jaxwest1/publications", timeout=30).json()
print(f"\nPublications: {pubs['count']}")
```
### Module 3: Measure Discovery — `/pheno/measureinfo/{selector}`
This is the canonical way to discover valid `measnum` values. The selector|
Opentrons Protocol API v2 for OT-2/Flex: Python protocols for pipetting, serial dilutions, PCR, plate replication; control thermocycler, heater-shaker, magnetic, temperature modules. Use pylabrobot for multi-vendor.
Interactive visualization with Plotly. 40+ chart types (scatter, line, heatmap, 3D, geographic) with hover, zoom, pan. Two APIs: Plotly Express (DataFrame) and Graph Objects (fine control). For static publication figures use matplotlib; for statistical grammar use seaborn.
Statistical visualization on matplotlib + pandas. Distributions (histplot, kdeplot, violin, box), relational (scatter, line), categorical, regression, correlation heatmaps. Auto aggregation/CIs. Use plotly for interactive; matplotlib for low-level.
Best practices for single-cell RNA-seq cell type annotation including marker-based, reference-based, and automated classification approaches.
Bayesian modeling with PyMC 5: priors, likelihood, NUTS/ADVI sampling, diagnostics (R-hat, ESS), LOO/WAIC comparison, prediction. Hierarchical, logistic, GP variants; predictive checks.
Time-to-event modeling with scikit-survival: Cox PH (elastic net), Random Survival Forests, Boosting, SVMs for censored data. C-index, Brier, time-dependent AUC; Kaplan-Meier, Nelson-Aalen, competing risks. Pipeline/GridSearchCV compatible. Use statsmodels for frequentist, pymc for Bayesian, lifelines for parametric.
>-