Skill286 estrellas del repoactualizado 5d ago

remap-database

The remap-database skill queries ReMap 2022, a curated database of 165 million transcription factor ChIP-seq peaks across 1,210 TFs and four genomes, via its public REST API. Use it to retrieve TF peaks overlapping genomic regions, identify binding sites near genes, filter peaks by regulatory annotation, download BED files for offline analysis, and map the transcriptional regulatory landscape of a locus. Pair with jaspar-database for motif models and encode-database for ENCODE-specific accessibility tracks.

Ver fuente Repositorio: SciAgent-Skills

Instalar en Claude Code

Copiar

git clone --depth 1 https://github.com/jaechang-hits/SciAgent-Skills /tmp/remap-database && cp -r /tmp/remap-database/skills/genomics-bioinformatics/databases/remap-database ~/.claude/skills/remap-database

Después abre una sesión nueva de Claude Code; el skill carga automáticamente.

Definición

SKILL.md

# ReMap Database

## Overview

ReMap 2022 is an integrative database of transcription factor (TF), cofactor, and chromatin regulator binding sites derived from uniformly reprocessed ChIP-seq experiments. The 2022 release catalogs 165 million non-redundant peaks from 8,113 ChIP-seq datasets covering 1,210 TFs across human (hg38/hg19), mouse (mm10), Drosophila, and Arabidopsis genomes. All peaks are called with a consistent pipeline from public GEO/ArrayExpress experiments. Access is via the ReMap 2022 REST API at `https://remap2022.univ-amu.fr/api/` and bulk BED file downloads; no authentication required.

## When to Use

- Finding all TFs with ChIP-seq peaks overlapping a genomic region of interest (e.g., a GWAS SNP locus or candidate enhancer)
- Retrieving TF peaks near a gene's transcription start site to map its proximal regulatory landscape
- Listing all TFs available in ReMap for human or mouse with their peak and dataset counts
- Filtering ChIP-seq peaks by regulatory biotype annotation (promoter, enhancer, exon, intron, intergenic) for a TF in a specific cell line
- Downloading a BED file of all binding peaks for a TF across all cell types for offline analysis
- Identifying co-binding TFs at a locus by querying all overlapping peaks and grouping by TF name
- Use `jaspar-database` instead when you need PWM/PFM sequence models of TF binding specificity rather than ChIP-seq peak locations
- For ENCODE-specific regulatory tracks and accessibility data use `encode-database`; ReMap aggregates TF binding peaks from many sources including ENCODE

## Prerequisites

- **Python packages**: `requests`, `pandas`, `matplotlib`
- **Data requirements**: genomic coordinates (GRCh38/hg38 or hg19), gene names, or TF names
- **Environment**: internet connection; no API key required
- **Rate limits**: no official published limits; use `time.sleep(0.5)` between batch requests to avoid server overload
- **Note**: The ReMap API is a research API; endpoint availability may vary. All examples include a BED download fallback.

```bash
pip install requests pandas matplotlib
```

## Quick Start

```python
import requests

REMAP_API = "https://remap2022.univ-amu.fr/api/v1"

# Query TF peaks overlapping a genomic region
r = requests.get(f"{REMAP_API}/peaks/overlap/", params={
    "chr": "chr17",
    "start": 7_670_000,
    "end": 7_690_000,
    "assembly": "hg38"
}, timeout=30)
r.raise_for_status()
peaks = r.json()
print(f"Peaks overlapping TP53 locus: {len(peaks)}")
tfs = set(p.get("name", "").split(":")[0] for p in peaks)
print(f"Unique TFs: {len(tfs)}")
print(f"TF names (first 10): {sorted(tfs)[:10]}")
```

## Core API

### Query 1: Region Overlap

Find all TF ChIP-seq peaks overlapping a specified genomic window. Returns peak records including TF name, cell type, coordinates, and score.

```python
import requests, time, pandas as pd

REMAP_API = "https://remap2022.univ-amu.fr/api/v1"

def query_region(chrom, start, end, assembly="hg38", timeout=30):
    """Return all ReMap peaks overlapping [chrom:start-end]."""
    r = requests.get(f"{REMAP_API}/peaks/overlap/", params={
        "chr": chrom, "start": start, "end": end, "assembly": assembly
    }, timeout=timeout)
    r.raise_for_status()
    return r.json()

# Query 100 kb window on chr17 around TP53
peaks = query_region("chr17", 7_670_000, 7_690_000, assembly="hg38")
print(f"Total peaks: {len(peaks)}")

# Parse name field: format is "TF:experiment_id:cell_type"
rows = []
for p in peaks:
    parts = p.get("name", "::").split(":")
    tf   = parts[0] if len(parts) > 0 else ""
    exp  = parts[1] if len(parts) > 1 else ""
    cell = parts[2] if len(parts) > 2 else ""
    rows.append({
        "chr": p.get("chr", p.get("chrom", "")),
        "start": p.get("start", 0),
        "end": p.get("end", 0),
        "tf_name": tf,
        "experiment_id": exp,
        "cell_type": cell,
        "score": p.get("score", 0),
    })

df = pd.DataFrame(rows)
print(f"\nUnique TFs: {df['tf_name'].nunique()}")
print(f"Top TFs by peak count:\n{df['tf_name'].value_counts().head(10).to_string()}")
```

```python
# Fallback: if API is unavailable, use a locally downloaded BED file
# Download from: https://remap2022.univ-amu.fr/download_page
# e.g., remap2022_all_macs2_hg38_v1_0.bed.gz

import pandas as pd

def query_region_from_bed(bed_file, chrom, start, end):
    """Filter a ReMap BED file for overlapping peaks."""
    cols = ["chr", "start", "end", "name", "score", "strand",
            "thick_start", "thick_end", "color"]
    df = pd.read_csv(bed_file, sep="\t", header=None, names=cols,
                     compression="infer")
    mask = (df["chr"] == chrom) & (df["end"] > start) & (df["start"] < end)
    return df[mask].reset_index(drop=True)

# Usage (requires downloaded BED):
# df = query_region_from_bed("remap2022_all_macs2_hg38_v1_0.bed.gz",
#                             "chr17", 7_670_000, 7_690_000)
```

### Query 2: Gene-Centric Query

Retrieve all TF ChIP-seq peaks near a gene's TSS, providing a promoter-proximal regulatory landscape for the gene.

```python
import requests, time, pandas as pd

REMAP_API = "https://remap2022.univ-amu.fr/api/v1"

def query_gene_peaks(gene_name, assembly="hg38", timeout=30):
    """Return all ReMap peaks near a gene TSS."""
    r = requests.get(f"{REMAP_API}/peaks/gene/", params={
        "gene": gene_name, "assembly": assembly
    }, timeout=timeout)
    r.raise_for_status()
    return r.json()

peaks = query_gene_peaks("MYC", assembly="hg38")
print(f"Peaks near MYC TSS: {len(peaks)}")

rows = []
for p in peaks:
    parts = p.get("name", "::").split(":")
    rows.append({
        "tf_name": parts[0] if parts else "",
        "cell_type": parts[2] if len(parts) > 2 else "",
        "chr": p.get("chr", p.get("chrom", "")),
        "start": p.get("start", 0),
        "end": p.get("end", 0),
        "score": p.get("score", 0),
        "biotype": p.get("biotype", ""),
    })

df = pd.DataFrame(rows)
print(f"\nTFs near MYC TSS ({df['tf_

Del mismo repositorio

sciagent-skill-creatorSkill

opentrons-integrationSkill

Opentrons Protocol API v2 for OT-2/Flex: Python protocols for pipetting, serial dilutions, PCR, plate replication; control thermocycler, heater-shaker, magnetic, temperature modules. Use pylabrobot for multi-vendor.

plotly-interactive-visualizationSkill

Interactive visualization with Plotly. 40+ chart types (scatter, line, heatmap, 3D, geographic) with hover, zoom, pan. Two APIs: Plotly Express (DataFrame) and Graph Objects (fine control). For static publication figures use matplotlib; for statistical grammar use seaborn.

seaborn-statistical-visualizationSkill

Statistical visualization on matplotlib + pandas. Distributions (histplot, kdeplot, violin, box), relational (scatter, line), categorical, regression, correlation heatmaps. Auto aggregation/CIs. Use plotly for interactive; matplotlib for low-level.

single-cell-annotationSkill

Best practices for single-cell RNA-seq cell type annotation including marker-based, reference-based, and automated classification approaches.

pymc-bayesian-modelingSkill

Bayesian modeling with PyMC 5: priors, likelihood, NUTS/ADVI sampling, diagnostics (R-hat, ESS), LOO/WAIC comparison, prediction. Hierarchical, logistic, GP variants; predictive checks.

scikit-survival-analysisSkill

Time-to-event modeling with scikit-survival: Cox PH (elastic net), Random Survival Forests, Boosting, SVMs for censored data. C-index, Brier, time-dependent AUC; Kaplan-Meier, Nelson-Aalen, competing risks. Pipeline/GridSearchCV compatible. Use statsmodels for frequentist, pymc for Bayesian, lifelines for parametric.

statistical-analysisSkill