encode-database
ENCODE Portal REST API for regulatory genomics: TF ChIP-seq, ATAC-seq/DNase-seq peaks, histone marks, and RNA-seq across 1000+ cell types. Search experiments by assay/biosample/target; download BED/bigWig; retrieve SCREEN cCREs by region or gene. Use to annotate variants with regulatory tracks, find open chromatin in a cell type, or fetch peak files for ChIP/ATAC analysis. For regulatory variant scoring use regulomedb-database; for GWAS associations use gwas-database.
git clone --depth 1 https://github.com/jaechang-hits/SciAgent-Skills /tmp/encode-database && cp -r /tmp/encode-database/skills/genomics-bioinformatics/databases/encode-database ~/.claude/skills/encode-databaseSKILL.md
# ENCODE Database
## Overview
The ENCODE (Encyclopedia of DNA Elements) Project has generated thousands of functional genomics experiments — TF ChIP-seq, ATAC-seq, DNase-seq, histone ChIP-seq, and RNA-seq — across 1000+ human and mouse cell types and tissues. The ENCODE Portal REST API provides structured JSON access to experiment metadata, file download URLs, and SCREEN cCRE (candidate cis-Regulatory Elements) annotations. All data is freely accessible without authentication for most endpoints.
## When to Use
- Downloading TF ChIP-seq peak files (BED) for a specific transcription factor and cell type to annotate regulatory regions
- Finding ATAC-seq or DNase-seq peaks in a cell type to identify open chromatin regions near a gene of interest
- Retrieving cCREs (candidate cis-Regulatory Elements) overlapping a genomic region from ENCODE SCREEN
- Building reference regulatory tracks for variant annotation pipelines (e.g., annotating VCF variants against ENCODE peak sets)
- Exploring which experiments are available for a biosample (cell line, tissue, developmental stage) before planning a wet-lab experiment
- Querying all ChIP-seq experiments for a transcription factor across multiple cell types for comparative regulatory analysis
- Use `regulomedb-database` instead when you want pre-computed regulatory scores for specific SNPs — RegulomeDB integrates ENCODE data with eQTL and motif evidence into a single score
- Use `deeptools-ngs-analysis` instead when you have your own BAM files and need to generate bigWig coverage tracks; ENCODE database is for retrieving existing deposited data
## Prerequisites
- **Python packages**: `requests`, `pandas`, `matplotlib`
- **Data requirements**: experiment accessions (e.g., `ENCSR000AKC`), biosample names (e.g., `K562`), TF target names (e.g., `CTCF`, `TP53`), or genomic regions (`chr7:117548628-117748628`)
- **Environment**: internet connection; no authentication required for public data; add `Authorization: Bearer {api_key}` header for submitter access
- **Rate limits**: no published hard limit; add `time.sleep(0.5)` for large batch queries to avoid connection resets
```bash
pip install requests pandas matplotlib
```
## Quick Start
```python
import requests
BASE = "https://www.encodeproject.org"
def search_experiments(assay="TF ChIP-seq", target="CTCF", biosample="K562", limit=5):
"""Find ENCODE experiments matching assay type, target, and biosample."""
params = {
"type": "Experiment",
"assay_title": assay,
"target.label": target,
"biosample_ontology.term_name": biosample, # `biosample_summary` is a verbose freetext string; filter by ontology term name
"status": "released",
"format": "json",
"limit": limit,
}
r = requests.get(f"{BASE}/search/", params=params, timeout=30)
r.raise_for_status()
data = r.json()
experiments = data.get("@graph", [])
print(f"Found {data.get('total', 0)} experiments for {target} ChIP-seq in {biosample}")
for exp in experiments:
print(f" {exp['accession']} {exp.get('biosample_summary', '')} {exp.get('lab', {}).get('title', '')}")
return experiments
exps = search_experiments(assay="TF ChIP-seq", target="CTCF", biosample="K562")
```
## Core API
### Query 1: Experiment Search — Find Experiments by Assay, Biosample, Target
Search the ENCODE Portal for experiments matching structured criteria.
```python
import requests, pandas as pd
BASE = "https://www.encodeproject.org"
def search_experiments(assay_title=None, target=None, biosample=None,
organism="Homo sapiens", status="released", limit=50):
"""
Search ENCODE experiments with flexible filters.
Returns: pd.DataFrame of matching experiments.
"""
params = {
"type": "Experiment",
"status": status,
"replicates.library.biosample.donor.organism.scientific_name": organism,
"format": "json",
"limit": limit,
}
if assay_title:
params["assay_title"] = assay_title
if target:
params["target.label"] = target
if biosample:
params["biosample_ontology.term_name"] = biosample # filter by ontology term, not the freetext `biosample_summary`
r = requests.get(f"{BASE}/search/", params=params, timeout=30)
r.raise_for_status()
data = r.json()
total = data.get("total", 0)
print(f"Total matching experiments: {total} (showing {min(limit, total)})")
records = []
for exp in data.get("@graph", []):
records.append({
"accession": exp.get("accession"),
"assay": exp.get("assay_title"),
"biosample": exp.get("biosample_summary"),
"target": exp.get("target", {}).get("label", ""),
"lab": exp.get("lab", {}).get("title", ""),
"date_released": exp.get("date_released", ""),
})
df = pd.DataFrame(records)
print(df.to_string(index=False))
return df
# CTCF ChIP-seq in HCT116 colon cancer cells
df = search_experiments(assay_title="TF ChIP-seq", target="CTCF", biosample="HCT116")
```
```python
# ATAC-seq experiments in multiple cell types
df_atac = search_experiments(assay_title="ATAC-seq", limit=20)
print(f"\nUnique cell types: {df_atac['biosample'].nunique()}")
```
### Query 2: File Download — Get Metadata and Download URLs for BED/bigWig Files
Retrieve file metadata for a specific experiment and obtain download URLs.
```python
import requests, pandas as pd
BASE = "https://www.encodeproject.org"
def get_experiment_files(accession, file_format="bed", output_type="peaks",
assembly="GRCh38"):
"""
Get file download URLs for a specific ENCODE experiment.
accession: experiment accession, e.g. 'ENCSR000AKC'
file_format: 'bed', 'bigWig', 'fastq', 'bam'
output_type: 'peaks', 'signal', 'alignments', 'reads'
Returns: pd.DataFrame of matching files with download URLs.
"""
params = {
"type": "F|
Opentrons Protocol API v2 for OT-2/Flex: Python protocols for pipetting, serial dilutions, PCR, plate replication; control thermocycler, heater-shaker, magnetic, temperature modules. Use pylabrobot for multi-vendor.
Interactive visualization with Plotly. 40+ chart types (scatter, line, heatmap, 3D, geographic) with hover, zoom, pan. Two APIs: Plotly Express (DataFrame) and Graph Objects (fine control). For static publication figures use matplotlib; for statistical grammar use seaborn.
Statistical visualization on matplotlib + pandas. Distributions (histplot, kdeplot, violin, box), relational (scatter, line), categorical, regression, correlation heatmaps. Auto aggregation/CIs. Use plotly for interactive; matplotlib for low-level.
Best practices for single-cell RNA-seq cell type annotation including marker-based, reference-based, and automated classification approaches.
Bayesian modeling with PyMC 5: priors, likelihood, NUTS/ADVI sampling, diagnostics (R-hat, ESS), LOO/WAIC comparison, prediction. Hierarchical, logistic, GP variants; predictive checks.
Time-to-event modeling with scikit-survival: Cox PH (elastic net), Random Survival Forests, Boosting, SVMs for censored data. C-index, Brier, time-dependent AUC; Kaplan-Meier, Nelson-Aalen, competing risks. Pipeline/GridSearchCV compatible. Use statsmodels for frequentist, pymc for Bayesian, lifelines for parametric.
>-