remap-database
Query ReMap 2022 TF ChIP-seq peak database via REST API and BED downloads. Retrieve TF peaks overlapping a region (chr:start-end), peaks near a gene, TFs by species, peaks filtered by biotype (promoter, enhancer), and BED files for a TF-cell type pair. Use for TF co-occupancy, regulatory annotation, and TF binding atlases. Use jaspar-database for PWM motifs; encode-database for ENCODE tracks.
git clone --depth 1 https://github.com/jaechang-hits/SciAgent-Skills /tmp/remap-database && cp -r /tmp/remap-database/skills/genomics-bioinformatics/databases/remap-database ~/.claude/skills/remap-databaseSKILL.md
# ReMap Database
## Overview
ReMap 2022 is an integrative database of transcription factor (TF), cofactor, and chromatin regulator binding sites derived from uniformly reprocessed ChIP-seq experiments. The 2022 release catalogs 165 million non-redundant peaks from 8,113 ChIP-seq datasets covering 1,210 TFs across human (hg38/hg19), mouse (mm10), Drosophila, and Arabidopsis genomes. All peaks are called with a consistent pipeline from public GEO/ArrayExpress experiments. Access is via the ReMap 2022 REST API at `https://remap2022.univ-amu.fr/api/` and bulk BED file downloads; no authentication required.
## When to Use
- Finding all TFs with ChIP-seq peaks overlapping a genomic region of interest (e.g., a GWAS SNP locus or candidate enhancer)
- Retrieving TF peaks near a gene's transcription start site to map its proximal regulatory landscape
- Listing all TFs available in ReMap for human or mouse with their peak and dataset counts
- Filtering ChIP-seq peaks by regulatory biotype annotation (promoter, enhancer, exon, intron, intergenic) for a TF in a specific cell line
- Downloading a BED file of all binding peaks for a TF across all cell types for offline analysis
- Identifying co-binding TFs at a locus by querying all overlapping peaks and grouping by TF name
- Use `jaspar-database` instead when you need PWM/PFM sequence models of TF binding specificity rather than ChIP-seq peak locations
- For ENCODE-specific regulatory tracks and accessibility data use `encode-database`; ReMap aggregates TF binding peaks from many sources including ENCODE
## Prerequisites
- **Python packages**: `requests`, `pandas`, `matplotlib`
- **Data requirements**: genomic coordinates (GRCh38/hg38 or hg19), gene names, or TF names
- **Environment**: internet connection; no API key required
- **Rate limits**: no official published limits; use `time.sleep(0.5)` between batch requests to avoid server overload
- **Note**: The ReMap API is a research API; endpoint availability may vary. All examples include a BED download fallback.
```bash
pip install requests pandas matplotlib
```
## Quick Start
```python
import requests
REMAP_API = "https://remap2022.univ-amu.fr/api/v1"
# Query TF peaks overlapping a genomic region
r = requests.get(f"{REMAP_API}/peaks/overlap/", params={
"chr": "chr17",
"start": 7_670_000,
"end": 7_690_000,
"assembly": "hg38"
}, timeout=30)
r.raise_for_status()
peaks = r.json()
print(f"Peaks overlapping TP53 locus: {len(peaks)}")
tfs = set(p.get("name", "").split(":")[0] for p in peaks)
print(f"Unique TFs: {len(tfs)}")
print(f"TF names (first 10): {sorted(tfs)[:10]}")
```
## Core API
### Query 1: Region Overlap
Find all TF ChIP-seq peaks overlapping a specified genomic window. Returns peak records including TF name, cell type, coordinates, and score.
```python
import requests, time, pandas as pd
REMAP_API = "https://remap2022.univ-amu.fr/api/v1"
def query_region(chrom, start, end, assembly="hg38", timeout=30):
"""Return all ReMap peaks overlapping [chrom:start-end]."""
r = requests.get(f"{REMAP_API}/peaks/overlap/", params={
"chr": chrom, "start": start, "end": end, "assembly": assembly
}, timeout=timeout)
r.raise_for_status()
return r.json()
# Query 100 kb window on chr17 around TP53
peaks = query_region("chr17", 7_670_000, 7_690_000, assembly="hg38")
print(f"Total peaks: {len(peaks)}")
# Parse name field: format is "TF:experiment_id:cell_type"
rows = []
for p in peaks:
parts = p.get("name", "::").split(":")
tf = parts[0] if len(parts) > 0 else ""
exp = parts[1] if len(parts) > 1 else ""
cell = parts[2] if len(parts) > 2 else ""
rows.append({
"chr": p.get("chr", p.get("chrom", "")),
"start": p.get("start", 0),
"end": p.get("end", 0),
"tf_name": tf,
"experiment_id": exp,
"cell_type": cell,
"score": p.get("score", 0),
})
df = pd.DataFrame(rows)
print(f"\nUnique TFs: {df['tf_name'].nunique()}")
print(f"Top TFs by peak count:\n{df['tf_name'].value_counts().head(10).to_string()}")
```
```python
# Fallback: if API is unavailable, use a locally downloaded BED file
# Download from: https://remap2022.univ-amu.fr/download_page
# e.g., remap2022_all_macs2_hg38_v1_0.bed.gz
import pandas as pd
def query_region_from_bed(bed_file, chrom, start, end):
"""Filter a ReMap BED file for overlapping peaks."""
cols = ["chr", "start", "end", "name", "score", "strand",
"thick_start", "thick_end", "color"]
df = pd.read_csv(bed_file, sep="\t", header=None, names=cols,
compression="infer")
mask = (df["chr"] == chrom) & (df["end"] > start) & (df["start"] < end)
return df[mask].reset_index(drop=True)
# Usage (requires downloaded BED):
# df = query_region_from_bed("remap2022_all_macs2_hg38_v1_0.bed.gz",
# "chr17", 7_670_000, 7_690_000)
```
### Query 2: Gene-Centric Query
Retrieve all TF ChIP-seq peaks near a gene's TSS, providing a promoter-proximal regulatory landscape for the gene.
```python
import requests, time, pandas as pd
REMAP_API = "https://remap2022.univ-amu.fr/api/v1"
def query_gene_peaks(gene_name, assembly="hg38", timeout=30):
"""Return all ReMap peaks near a gene TSS."""
r = requests.get(f"{REMAP_API}/peaks/gene/", params={
"gene": gene_name, "assembly": assembly
}, timeout=timeout)
r.raise_for_status()
return r.json()
peaks = query_gene_peaks("MYC", assembly="hg38")
print(f"Peaks near MYC TSS: {len(peaks)}")
rows = []
for p in peaks:
parts = p.get("name", "::").split(":")
rows.append({
"tf_name": parts[0] if parts else "",
"cell_type": parts[2] if len(parts) > 2 else "",
"chr": p.get("chr", p.get("chrom", "")),
"start": p.get("start", 0),
"end": p.get("end", 0),
"score": p.get("score", 0),
"biotype": p.get("biotype", ""),
})
df = pd.DataFrame(rows)
print(f"\nTFs near MYC TSS ({df['tf_|
Opentrons Protocol API v2 for OT-2/Flex: Python protocols for pipetting, serial dilutions, PCR, plate replication; control thermocycler, heater-shaker, magnetic, temperature modules. Use pylabrobot for multi-vendor.
Interactive visualization with Plotly. 40+ chart types (scatter, line, heatmap, 3D, geographic) with hover, zoom, pan. Two APIs: Plotly Express (DataFrame) and Graph Objects (fine control). For static publication figures use matplotlib; for statistical grammar use seaborn.
Statistical visualization on matplotlib + pandas. Distributions (histplot, kdeplot, violin, box), relational (scatter, line), categorical, regression, correlation heatmaps. Auto aggregation/CIs. Use plotly for interactive; matplotlib for low-level.
Best practices for single-cell RNA-seq cell type annotation including marker-based, reference-based, and automated classification approaches.
Bayesian modeling with PyMC 5: priors, likelihood, NUTS/ADVI sampling, diagnostics (R-hat, ESS), LOO/WAIC comparison, prediction. Hierarchical, logistic, GP variants; predictive checks.
Time-to-event modeling with scikit-survival: Cox PH (elastic net), Random Survival Forests, Boosting, SVMs for censored data. C-index, Brier, time-dependent AUC; Kaplan-Meier, Nelson-Aalen, competing risks. Pipeline/GridSearchCV compatible. Use statsmodels for frequentist, pymc for Bayesian, lifelines for parametric.
>-