Skill286 estrellas del repoactualizado 5d ago

gwas-database

The gwas-database skill provides REST API access to the NHGRI-EBI GWAS Catalog, a curated database of genome-wide association studies mapping SNP-trait relationships. Use it to query published GWAS associations by variant, trait, gene, or study; retrieve summary statistics for polygenic risk score construction; analyze pleiotropy patterns; and explore the genetic architecture of complex diseases without requiring authentication.

Ver fuente Repositorio: SciAgent-Skills

Instalar en Claude Code

Copiar

git clone --depth 1 https://github.com/jaechang-hits/SciAgent-Skills /tmp/gwas-database && cp -r /tmp/gwas-database/skills/genomics-bioinformatics/databases/gwas-database ~/.claude/skills/gwas-database

Después abre una sesión nueva de Claude Code; el skill carga automáticamente.

Definición

SKILL.md

# GWAS Catalog Database — SNP-Trait Association Queries

## Overview

The NHGRI-EBI GWAS Catalog is a curated collection of published genome-wide association studies, mapping SNP-trait associations with genomic context. The REST API provides programmatic access to studies, associations, variants, traits, genes, and summary statistics. All responses are HAL+JSON with embedded `_links` for pagination.

## When to Use

- Finding genetic variants associated with a disease or trait (e.g., "which SNPs are linked to type 2 diabetes?")
- Retrieving genome-wide significant associations for a specific variant (rs ID)
- Exploring the genetic architecture of complex traits (number of loci, effect sizes)
- Checking variant pleiotropy (how many traits a single SNP affects)
- Downloading summary statistics for meta-analysis or polygenic risk score construction
- Identifying published GWAS studies by disease, gene, or PubMed ID
- Cross-referencing EFO trait ontology terms with GWAS evidence
- Building candidate gene lists from GWAS association regions
- For **drug target validation from GWAS hits**, use `opentargets-database` instead
- For **variant functional annotation** (consequence prediction, regulatory impact), use Ensembl VEP via `gget`

## Prerequisites

```bash
pip install requests matplotlib numpy
```

**API access**:
- **No authentication** required -- fully open access
- **Rate limits**: no official limit, but add `time.sleep(0.2)` between requests to be courteous
- **Base URL**: `https://www.ebi.ac.uk/gwas/rest/api`
- **Response format**: HAL+JSON with `_embedded` data and `_links` for pagination
- **Pagination**: default 20 results per page; max 500 via `size` parameter

## Quick Start

```python
import requests
import time

BASE = "https://www.ebi.ac.uk/gwas/rest/api"

def gwas_get(endpoint, params=None):
    """GWAS Catalog REST API helper with rate limiting and pagination support."""
    url = f"{BASE}/{endpoint}"
    resp = requests.get(url, params=params or {})
    resp.raise_for_status()
    time.sleep(0.2)
    return resp.json()

# Find studies for a trait keyword. Study records have no top-level `title`
# — the publication title lives at `publicationInfo.title`; the trait label
# lives at `diseaseTrait.trait`.
data = gwas_get("studies/search/findByDiseaseTrait", {"diseaseTrait": "diabetes"})
studies = data["_embedded"]["studies"]
print(f"Found {len(studies)} studies for 'diabetes'")
for s in studies[:3]:
    title = (s.get("publicationInfo") or {}).get("title", "N/A")
    trait = (s.get("diseaseTrait") or {}).get("trait", "N/A")
    print(f"  {s['accessionId']} | {trait[:40]:<40} | {title[:60]}")
```

## Core API

### Module 1: Study Search

Search GWAS studies by disease trait keyword or PubMed ID.

```python
# Search studies by disease trait
data = gwas_get("studies/search/findByDiseaseTrait", {"diseaseTrait": "breast cancer"})
studies = data["_embedded"]["studies"]
for s in studies[:5]:
    pi = s.get("publicationInfo") or {}
    print(f"  {s['accessionId']} | PMID:{pi.get('pubmedId','N/A')} | {pi.get('title','')[:60]}")

time.sleep(0.2)

# Search by PubMed ID. NOTE: the older `findByPubmedId` 404s on /studies/;
# the working endpoint is `findByPublicationIdPubmedId`.
data = gwas_get("studies/search/findByPublicationIdPubmedId", {"pubmedId": "25673413"})
studies = data["_embedded"]["studies"]
print(f"Studies from PMID 25673413: {len(studies)}")
for s in studies:
    trait = (s.get("diseaseTrait") or {}).get("trait", "N/A")
    print(f"  {s['accessionId']}: {trait}")
```

### Module 2: Association Queries

Retrieve SNP-trait associations filtered by trait (EFO term), variant, or p-value.

```python
# Associations by EFO trait. The old path `efoTraits/{shortForm}/associations`
# also works *if* you have the current shortForm — but trait shortForms have
# been re-mapped to MONDO (e.g. EFO_0000249 → MONDO_0004975). The most reliable
# path is `associations/search/findByEfoTrait?efoTrait=<canonical trait name>`.
data = gwas_get("associations/search/findByEfoTrait",
                {"efoTrait": "type 2 diabetes mellitus", "size": 50})
assocs = data["_embedded"]["associations"]
print(f"Associations for 'type 2 diabetes mellitus': {len(assocs)}")

for a in assocs[:5]:
    pval = a.get("pvalue", None)
    genes = []
    for locus in a.get("loci", []) or []:
        for gene in locus.get("authorReportedGenes", []) or []:
            genes.append(gene.get("geneName", ""))
    loci = a.get("loci") or [{}]
    snps = [r.get("snps", [{}])[0].get("rsId", "N/A")
            for r in (loci[0].get("strongestRiskAlleles") or [])]
    print(f"  rs={snps} | p={pval} | genes={genes}")
```

```python
# Associations for a specific variant. NOTE: association records do not embed
# `efoTraits` inline — they expose them via the `_links.efoTraits.href`
# HAL link. Follow the link (cached if needed) to resolve trait names.
data = gwas_get("singleNucleotidePolymorphisms/rs7903146/associations", {"size": 5})
assocs = data["_embedded"]["associations"]
print(f"Associations for rs7903146 (first page): {len(assocs)}")

def association_traits(assoc):
    """Resolve efoTraits via the HAL link on an association record."""
    href = (assoc.get("_links") or {}).get("efoTraits", {}).get("href")
    if not href:
        return []
    r = requests.get(href, timeout=15)
    if not r.ok:
        return []
    return [t.get("trait") for t in r.json().get("_embedded", {}).get("efoTraits", [])]

for a in assocs[:5]:
    traits = association_traits(a)
    print(f"  p={a.get('pvalue')} | OR={a.get('orPerCopyNum', 'N/A')} | traits={traits}")
    time.sleep(0.1)
```

### Module 3: Variant Lookup

Query variant details by rsID, chromosomal region, or cytogenetic band.

```python
# Lookup single variant
data = gwas_get("singleNucleotidePolymorphisms/rs7903146")
loc = data.get("locations", [{}])[0]
print(f"rs7903146: chr{loc.get('chromosomeName', '?')}:{loc.get('chromosomePosition', '?')}")
print(f"  Functional class: {data.get('f

Del mismo repositorio

sciagent-skill-creatorSkill

opentrons-integrationSkill

Opentrons Protocol API v2 for OT-2/Flex: Python protocols for pipetting, serial dilutions, PCR, plate replication; control thermocycler, heater-shaker, magnetic, temperature modules. Use pylabrobot for multi-vendor.

plotly-interactive-visualizationSkill

Interactive visualization with Plotly. 40+ chart types (scatter, line, heatmap, 3D, geographic) with hover, zoom, pan. Two APIs: Plotly Express (DataFrame) and Graph Objects (fine control). For static publication figures use matplotlib; for statistical grammar use seaborn.

seaborn-statistical-visualizationSkill

Statistical visualization on matplotlib + pandas. Distributions (histplot, kdeplot, violin, box), relational (scatter, line), categorical, regression, correlation heatmaps. Auto aggregation/CIs. Use plotly for interactive; matplotlib for low-level.

single-cell-annotationSkill

Best practices for single-cell RNA-seq cell type annotation including marker-based, reference-based, and automated classification approaches.

pymc-bayesian-modelingSkill

Bayesian modeling with PyMC 5: priors, likelihood, NUTS/ADVI sampling, diagnostics (R-hat, ESS), LOO/WAIC comparison, prediction. Hierarchical, logistic, GP variants; predictive checks.

scikit-survival-analysisSkill

Time-to-event modeling with scikit-survival: Cox PH (elastic net), Random Survival Forests, Boosting, SVMs for censored data. C-index, Brier, time-dependent AUC; Kaplan-Meier, Nelson-Aalen, competing risks. Pipeline/GridSearchCV compatible. Use statsmodels for frequentist, pymc for Bayesian, lifelines for parametric.

statistical-analysisSkill