Skip to main content
ClaudeWave
Skill199 estrellas del repoactualizado 16d ago

monarch-database

Monarch Initiative knowledge graph REST API for disease-gene-phenotype associations and cross-species orthology. MONDO disease-to-gene/phenotype, HP phenotype profiles, cross-species comparisons. Use for rare disease gene prioritization and phenotype-based candidate ranking. For GWAS use gwas-database; for clinical pathogenicity use clinvar-database.

Instalar en Claude Code
Copiar
git clone --depth 1 https://github.com/jaechang-hits/SciAgent-Skills /tmp/monarch-database && cp -r /tmp/monarch-database/skills/genomics-bioinformatics/databases/monarch-database ~/.claude/skills/monarch-database
Después abre una sesión nueva de Claude Code; el skill carga automáticamente.

SKILL.md

# monarch-database

## Overview

The Monarch Initiative integrates disease-phenotype-gene relationships from 30+ biomedical databases (OMIM, Orphanet, ClinVar, MGI, ZFIN, Reactome) into a unified knowledge graph. The REST API at `https://api.monarchinitiative.org/v3/api` provides access to associations between genes, diseases, and phenotypes using MONDO disease IDs, Human Phenotype Ontology (HPO) terms, and standard gene identifiers. No authentication is required; the service is free for academic use.

## When to Use

- Mapping a disease (MONDO ID) to all associated causal genes and their evidence sources
- Retrieving phenotype profiles (HP terms) for a disease to build phenotypic similarity models
- Ranking candidate genes by phenotypic similarity to a patient's HPO symptom list
- Querying cross-species gene-phenotype associations (mouse, zebrafish, fly) for model organism comparisons
- Exploring rare disease gene-phenotype networks for diagnostic candidate generation
- Resolving entity metadata (gene symbol, disease name, phenotype label) from a MONDO/HP/HGNC ID
- Use `opentargets-database` instead when you need drug-target evidence scores or tractability data alongside disease associations
- Use `clinvar-database` when you need clinical pathogenicity classifications with submitter review status

## Prerequisites

- **Python packages**: `requests`, `pandas`, `matplotlib`
- **Data requirements**: MONDO IDs (e.g., `MONDO:0007374`), HP term IDs (e.g., `HP:0001250`), or gene symbols/HGNC IDs
- **Environment**: internet connection; no API key required
- **Rate limits**: no published rate limit; use `time.sleep(0.3)` between batch requests; avoid bursts over 10 requests/second

```bash
pip install requests pandas matplotlib
```

## Quick Start

```python
import requests

MONARCH_API = "https://api.monarchinitiative.org/v3/api"

def monarch_get(endpoint: str, params: dict = None) -> dict:
    """GET request to Monarch API; raises on HTTP errors."""
    r = requests.get(f"{MONARCH_API}{endpoint}", params=params, timeout=30)
    r.raise_for_status()
    return r.json()

# Get all genes associated with Marfan syndrome (MONDO:0007374)
result = monarch_get("/association/all", params={
    "subject": "MONDO:0007374",
    "category": "biolink:GeneToDiseaseAssociation",
    "limit": 10
})
print(f"Total gene associations: {result['total']}")
for item in result["items"][:5]:
    obj = item.get("object", {})
    print(f"  Gene: {obj.get('label', 'N/A')}  ({obj.get('id', 'N/A')})")
# Total gene associations: 3
# Gene: FBN1  (HGNC:3603)
```

## Core API

### Query 1: Disease-Gene Associations

Retrieve all genes associated with a disease by MONDO ID. Returns causal gene records with evidence metadata.

```python
import requests
import pandas as pd
import time

MONARCH_API = "https://api.monarchinitiative.org/v3/api"

def monarch_get(endpoint, params=None):
    r = requests.get(f"{MONARCH_API}{endpoint}", params=params, timeout=30)
    r.raise_for_status()
    return r.json()

def get_disease_genes(mondo_id: str, limit: int = 200) -> pd.DataFrame:
    """Return DataFrame of genes associated with a disease."""
    result = monarch_get("/association/all", params={
        "subject": mondo_id,
        "category": "biolink:CausalGeneToDiseaseAssociation",
        "limit": limit
    })
    rows = []
    for item in result.get("items", []):
        obj = item.get("object", {})
        rows.append({
            "gene_id": obj.get("id"),
            "gene_symbol": obj.get("label"),
            "taxon": obj.get("taxon", {}).get("label") if obj.get("taxon") else None,
            "relation": item.get("predicate"),
            "evidence_count": len(item.get("evidence", [])),
        })
    return pd.DataFrame(rows)

# Cystic fibrosis (MONDO:0009861)
df = get_disease_genes("MONDO:0009861")
print(f"Genes for cystic fibrosis: {len(df)}")
print(df[["gene_symbol", "gene_id", "relation"]].to_string(index=False))
# Genes for cystic fibrosis: 1
# gene_symbol  gene_id    relation
#        CFTR  HGNC:1884  biolink:causes
```

### Query 2: Disease-Phenotype Associations

Retrieve HPO phenotype terms linked to a disease. Useful for building phenotype profiles and similarity scoring.

```python
def get_disease_phenotypes(mondo_id: str, limit: int = 200) -> pd.DataFrame:
    """Return DataFrame of phenotypes (HP terms) for a disease."""
    result = monarch_get("/association/all", params={
        "subject": mondo_id,
        "category": "biolink:DiseaseToPhenotypicFeatureAssociation",
        "limit": limit
    })
    rows = []
    for item in result.get("items", []):
        obj = item.get("object", {})
        rows.append({
            "hp_id": obj.get("id"),
            "phenotype": obj.get("label"),
            "frequency": item.get("frequency", {}).get("label") if item.get("frequency") else None,
            "onset": item.get("onset", {}).get("label") if item.get("onset") else None,
        })
    return pd.DataFrame(rows)

# Marfan syndrome (MONDO:0007374)
df = get_disease_phenotypes("MONDO:0007374", limit=50)
print(f"Phenotypes for Marfan syndrome: {len(df)}")
print(df[["phenotype", "hp_id", "frequency"]].head(8).to_string(index=False))
# Phenotypes for Marfan syndrome: 26
# phenotype                      hp_id        frequency
# Aortic root aneurysm        HP:0002616    HP:0040281  ...
```

### Query 3: Entity Lookup

Retrieve metadata for any Monarch entity (gene, disease, phenotype) by its identifier.

```python
def get_entity(entity_id: str) -> dict:
    """Retrieve metadata for a gene, disease, or phenotype by its ID."""
    result = monarch_get(f"/entity/{entity_id}")
    return result

# Look up HP:0001250 (Seizure)
hp = get_entity("HP:0001250")
print(f"Name: {hp.get('name')}")
print(f"ID: {hp.get('id')}")
print(f"Description: {hp.get('description', '')[:120]}")
print(f"Synonyms: {[s.get('val') for s in hp.get('synonyms', [])[:3]]}")
# Name: Seizure
# ID: HP:0001250
# Description: A seizure is an intermittent abnor
sciagent-skill-creatorSkill

|

opentrons-integrationSkill

Opentrons Protocol API v2 for OT-2/Flex: Python protocols for pipetting, serial dilutions, PCR, plate replication; control thermocycler, heater-shaker, magnetic, temperature modules. Use pylabrobot for multi-vendor.

plotly-interactive-visualizationSkill

Interactive visualization with Plotly. 40+ chart types (scatter, line, heatmap, 3D, geographic) with hover, zoom, pan. Two APIs: Plotly Express (DataFrame) and Graph Objects (fine control). For static publication figures use matplotlib; for statistical grammar use seaborn.

seaborn-statistical-visualizationSkill

Statistical visualization on matplotlib + pandas. Distributions (histplot, kdeplot, violin, box), relational (scatter, line), categorical, regression, correlation heatmaps. Auto aggregation/CIs. Use plotly for interactive; matplotlib for low-level.

single-cell-annotationSkill

Best practices for single-cell RNA-seq cell type annotation including marker-based, reference-based, and automated classification approaches.

pymc-bayesian-modelingSkill

Bayesian modeling with PyMC 5: priors, likelihood, NUTS/ADVI sampling, diagnostics (R-hat, ESS), LOO/WAIC comparison, prediction. Hierarchical, logistic, GP variants; predictive checks.

scikit-survival-analysisSkill

Time-to-event modeling with scikit-survival: Cox PH (elastic net), Random Survival Forests, Boosting, SVMs for censored data. C-index, Brier, time-dependent AUC; Kaplan-Meier, Nelson-Aalen, competing risks. Pipeline/GridSearchCV compatible. Use statsmodels for frequentist, pymc for Bayesian, lifelines for parametric.

statistical-analysisSkill

>-