Skip to main content
ClaudeWave
Skill199 estrellas del repoactualizado 16d ago

macs3-peak-calling

Poisson-model peak caller for ChIP-seq/ATAC-seq BAMs. MACS3 callpeak finds enriched regions (TF sites or histone marks) vs input/IgG; outputs BED narrowPeak/broadPeak for motif analysis, annotation, and differential binding. Use narrow peaks for TF ChIP-seq and ATAC-seq; broad for H3K27me3, H3K9me3, and other broad marks.

Instalar en Claude Code
Copiar
git clone --depth 1 https://github.com/jaechang-hits/SciAgent-Skills /tmp/macs3-peak-calling && cp -r /tmp/macs3-peak-calling/skills/genomics-bioinformatics/macs3-peak-calling ~/.claude/skills/macs3-peak-calling
Después abre una sesión nueva de Claude Code; el skill carga automáticamente.

SKILL.md

# MACS3 — ChIP-seq and ATAC-seq Peak Caller

## Overview

MACS3 (Model-based Analysis of ChIP-seq) identifies regions of significant read enrichment (peaks) from ChIP-seq, ATAC-seq, CUT&RUN, and CUT&TAG experiments. It models the fragment length distribution from paired-end data or estimates it from mono-nucleosomal read shifting in single-end data, then applies a Poisson model to identify fold-enrichment over an input/IgG control. MACS3 produces BED-format narrowPeak (for transcription factors) or broadPeak (for histone marks) files with signal and q-value tracks for visualization in IGV or UCSC Genome Browser.

## When to Use

- Calling transcription factor binding peaks from ChIP-seq experiments (use `--nomodel --extsize 200` or let MACS3 estimate fragment length)
- Identifying open chromatin regions from ATAC-seq experiments (use `--nomodel --shift -100 --extsize 200 -f BAMPE`)
- Calling broad histone modification peaks (H3K27me3, H3K9me3, H3K36me3) with `--broad`
- Generating peak signal tracks (bedGraph/bigWig) for genome browser visualization with `-B --SPMR`
- Performing differential binding analysis: MACS3 peaks as input to DiffBind or DESeq2
- Use **HMMRATAC** (part of MACS3) for nucleosome-resolution ATAC-seq peak calling
- Use **SPP** or **HOMER** as alternatives; MACS3 is the ENCODE-recommended standard

## Prerequisites

- **Python packages**: `macs3` (Python ≥ 3.8)
- **Input**: Sorted BAM files (with index) from ChIP-seq or ATAC-seq alignment (e.g., using STAR or Bowtie2)
- **Optional**: Input/IgG control BAM for background normalization

> **Check before installing**: The tool may already be available in the current environment (e.g., inside a `pixi` / `conda` env). Run `command -v macs3` first and skip the install commands below if it returns a path. When running inside a pixi project, invoke the tool via `pixi run macs3` rather than bare `macs3`.

```bash
# Install with pip or conda
pip install macs3
# or
conda install -c bioconda macs3

# Verify
macs3 --version
# macs3 3.0.2
```

## Quick Start

```bash
# Call peaks for TF ChIP-seq (narrow peaks, with input control)
macs3 callpeak \
    -t chip.bam \
    -c input.bam \
    -f BAM \
    -g hs \
    -n sample_tf \
    --outdir peaks/ \
    -q 0.05

# Output: peaks/sample_tf_peaks.narrowPeak
wc -l peaks/sample_tf_peaks.narrowPeak
```

## Workflow

### Step 1: Prepare Input BAM Files

MACS3 requires sorted, indexed BAM files from genome alignment.

```bash
# Sort and index ChIP and control BAMs (if not already done)
samtools sort -@ 8 chip_raw.bam -o chip.bam
samtools sort -@ 8 input_raw.bam -o input.bam
samtools index chip.bam
samtools index input.bam

# Check read counts
echo "ChIP reads: $(samtools view -c -F 4 chip.bam)"
echo "Input reads: $(samtools view -c -F 4 input.bam)"
```

### Step 2: Call Narrow Peaks (TF ChIP-seq)

Use the default mode for transcription factor binding site identification.

```bash
# TF ChIP-seq with input control
macs3 callpeak \
    -t chip.bam \
    -c input.bam \
    -f BAM \
    -g hs \
    -n tf_chip \
    --outdir peaks/ \
    -q 0.05 \
    --keep-dup auto

echo "Peaks called: $(wc -l < peaks/tf_chip_peaks.narrowPeak)"
echo "Summit file: peaks/tf_chip_summits.bed"

# Without input control (less recommended)
macs3 callpeak \
    -t chip.bam \
    -f BAM \
    -g hs \
    -n tf_noinput \
    --outdir peaks/ \
    --nolambda
```

### Step 3: Call Broad Peaks (Histone Marks)

Use `--broad` for spread histone modifications like H3K27me3 or H3K36me3.

```bash
# H3K27me3 broad histone mark
macs3 callpeak \
    -t h3k27me3.bam \
    -c input.bam \
    -f BAM \
    -g hs \
    -n h3k27me3 \
    --outdir peaks/ \
    --broad \
    --broad-cutoff 0.1 \
    -q 0.05

echo "Broad peaks: $(wc -l < peaks/h3k27me3_peaks.broadPeak)"

# H3K4me3 (sharp mark — use narrow peaks)
macs3 callpeak \
    -t h3k4me3.bam \
    -c input.bam \
    -f BAM \
    -g hs \
    -n h3k4me3 \
    --outdir peaks/ \
    -q 0.05
```

### Step 4: Call ATAC-seq Peaks

ATAC-seq requires special handling for the Tn5 insertion site.

```bash
# ATAC-seq with paired-end BAM (recommended)
macs3 callpeak \
    -t atac.bam \
    -f BAMPE \
    -g hs \
    -n atac_sample \
    --outdir peaks/ \
    --nomodel \
    --nolambda \
    -q 0.05 \
    --keep-dup all

echo "ATAC peaks: $(wc -l < peaks/atac_sample_peaks.narrowPeak)"

# Single-end ATAC-seq: shift reads to center on Tn5 cut site
macs3 callpeak \
    -t atac_se.bam \
    -f BAM \
    -g hs \
    -n atac_se \
    --outdir peaks/ \
    --nomodel \
    --shift -100 \
    --extsize 200 \
    --keep-dup all
```

### Step 5: Generate Signal Tracks for Visualization

Produce bedGraph and bigWig files for genome browser visualization.

```bash
# Generate bedGraph normalized to million reads (SPMR)
macs3 callpeak \
    -t chip.bam \
    -c input.bam \
    -f BAM \
    -g hs \
    -n chip_track \
    --outdir tracks/ \
    -B \
    --SPMR \
    --keep-dup auto

# Convert bedGraph to bigWig for IGV/UCSC
# Requires bedGraphToBigWig and chrom.sizes
sort -k1,1 -k2,2n tracks/chip_track_treat_pileup.bdg > tracks/chip_sorted.bdg
bedGraphToBigWig tracks/chip_sorted.bdg genome/hg38.chrom.sizes tracks/chip.bw

echo "BigWig track: tracks/chip.bw"
```

### Step 6: Annotate and Analyze Peaks

Parse narrowPeak output and annotate peaks to genomic features.

```python
import pandas as pd

# Load narrowPeak file
# Columns: chrom, start, end, name, score, strand, signalValue, pValue, qValue, peak
cols = ["chrom", "start", "end", "name", "score", "strand",
        "signalValue", "pValue", "qValue", "peak"]
peaks = pd.read_csv("peaks/tf_chip_peaks.narrowPeak", sep="\t",
                    header=None, names=cols)

print(f"Total peaks: {len(peaks)}")
print(f"Peaks on chr1: {(peaks['chrom'] == 'chr1').sum()}")
print(f"Median peak width: {(peaks['end'] - peaks['start']).median():.0f} bp")
print(f"Peaks with q-value < 0.01: {(peaks['qValue'] > 2).sum()}")  # -log10(q) > 2

# Filter high-confidence peaks
high_conf =
sciagent-skill-creatorSkill

|

opentrons-integrationSkill

Opentrons Protocol API v2 for OT-2/Flex: Python protocols for pipetting, serial dilutions, PCR, plate replication; control thermocycler, heater-shaker, magnetic, temperature modules. Use pylabrobot for multi-vendor.

plotly-interactive-visualizationSkill

Interactive visualization with Plotly. 40+ chart types (scatter, line, heatmap, 3D, geographic) with hover, zoom, pan. Two APIs: Plotly Express (DataFrame) and Graph Objects (fine control). For static publication figures use matplotlib; for statistical grammar use seaborn.

seaborn-statistical-visualizationSkill

Statistical visualization on matplotlib + pandas. Distributions (histplot, kdeplot, violin, box), relational (scatter, line), categorical, regression, correlation heatmaps. Auto aggregation/CIs. Use plotly for interactive; matplotlib for low-level.

single-cell-annotationSkill

Best practices for single-cell RNA-seq cell type annotation including marker-based, reference-based, and automated classification approaches.

pymc-bayesian-modelingSkill

Bayesian modeling with PyMC 5: priors, likelihood, NUTS/ADVI sampling, diagnostics (R-hat, ESS), LOO/WAIC comparison, prediction. Hierarchical, logistic, GP variants; predictive checks.

scikit-survival-analysisSkill

Time-to-event modeling with scikit-survival: Cox PH (elastic net), Random Survival Forests, Boosting, SVMs for censored data. C-index, Brier, time-dependent AUC; Kaplan-Meier, Nelson-Aalen, competing risks. Pipeline/GridSearchCV compatible. Use statsmodels for frequentist, pymc for Bayesian, lifelines for parametric.

statistical-analysisSkill

>-