Skill286 estrellas del repoactualizado 5d ago

macs3-peak-calling

MACS3 is a Poisson-based peak caller that identifies enriched genomic regions from ChIP-seq, ATAC-seq, and histone ChIP experiments by comparing treatment BAM files against input controls. Use narrowPeak format for transcription factor binding sites and broadPeak format for broad histone modifications, with output suitable for motif discovery, genome browser visualization, and differential binding analysis.

Ver fuente Repositorio: SciAgent-Skills

Instalar en Claude Code

Copiar

git clone --depth 1 https://github.com/jaechang-hits/SciAgent-Skills /tmp/macs3-peak-calling && cp -r /tmp/macs3-peak-calling/skills/genomics-bioinformatics/macs3-peak-calling ~/.claude/skills/macs3-peak-calling

Después abre una sesión nueva de Claude Code; el skill carga automáticamente.

Definición

SKILL.md

# MACS3 — ChIP-seq and ATAC-seq Peak Caller

## Overview

MACS3 (Model-based Analysis of ChIP-seq) identifies regions of significant read enrichment (peaks) from ChIP-seq, ATAC-seq, CUT&RUN, and CUT&TAG experiments. It models the fragment length distribution from paired-end data or estimates it from mono-nucleosomal read shifting in single-end data, then applies a Poisson model to identify fold-enrichment over an input/IgG control. MACS3 produces BED-format narrowPeak (for transcription factors) or broadPeak (for histone marks) files with signal and q-value tracks for visualization in IGV or UCSC Genome Browser.

## When to Use

- Calling transcription factor binding peaks from ChIP-seq experiments (use `--nomodel --extsize 200` or let MACS3 estimate fragment length)
- Identifying open chromatin regions from ATAC-seq experiments (use `--nomodel --shift -100 --extsize 200 -f BAMPE`)
- Calling broad histone modification peaks (H3K27me3, H3K9me3, H3K36me3) with `--broad`
- Generating peak signal tracks (bedGraph/bigWig) for genome browser visualization with `-B --SPMR`
- Performing differential binding analysis: MACS3 peaks as input to DiffBind or DESeq2
- Use **HMMRATAC** (part of MACS3) for nucleosome-resolution ATAC-seq peak calling
- Use **SPP** or **HOMER** as alternatives; MACS3 is the ENCODE-recommended standard

## Prerequisites

- **Python packages**: `macs3` (Python ≥ 3.8)
- **Input**: Sorted BAM files (with index) from ChIP-seq or ATAC-seq alignment (e.g., using STAR or Bowtie2)
- **Optional**: Input/IgG control BAM for background normalization

> **Check before installing**: The tool may already be available in the current environment (e.g., inside a `pixi` / `conda` env). Run `command -v macs3` first and skip the install commands below if it returns a path. When running inside a pixi project, invoke the tool via `pixi run macs3` rather than bare `macs3`.

```bash
# Install with pip or conda
pip install macs3
# or
conda install -c bioconda macs3

# Verify
macs3 --version
# macs3 3.0.2
```

## Quick Start

```bash
# Call peaks for TF ChIP-seq (narrow peaks, with input control)
macs3 callpeak \
    -t chip.bam \
    -c input.bam \
    -f BAM \
    -g hs \
    -n sample_tf \
    --outdir peaks/ \
    -q 0.05

# Output: peaks/sample_tf_peaks.narrowPeak
wc -l peaks/sample_tf_peaks.narrowPeak
```

## Workflow

### Step 1: Prepare Input BAM Files

MACS3 requires sorted, indexed BAM files from genome alignment.

```bash
# Sort and index ChIP and control BAMs (if not already done)
samtools sort -@ 8 chip_raw.bam -o chip.bam
samtools sort -@ 8 input_raw.bam -o input.bam
samtools index chip.bam
samtools index input.bam

# Check read counts
echo "ChIP reads: $(samtools view -c -F 4 chip.bam)"
echo "Input reads: $(samtools view -c -F 4 input.bam)"
```

### Step 2: Call Narrow Peaks (TF ChIP-seq)

Use the default mode for transcription factor binding site identification.

```bash
# TF ChIP-seq with input control
macs3 callpeak \
    -t chip.bam \
    -c input.bam \
    -f BAM \
    -g hs \
    -n tf_chip \
    --outdir peaks/ \
    -q 0.05 \
    --keep-dup auto

echo "Peaks called: $(wc -l < peaks/tf_chip_peaks.narrowPeak)"
echo "Summit file: peaks/tf_chip_summits.bed"

# Without input control (less recommended)
macs3 callpeak \
    -t chip.bam \
    -f BAM \
    -g hs \
    -n tf_noinput \
    --outdir peaks/ \
    --nolambda
```

### Step 3: Call Broad Peaks (Histone Marks)

Use `--broad` for spread histone modifications like H3K27me3 or H3K36me3.

```bash
# H3K27me3 broad histone mark
macs3 callpeak \
    -t h3k27me3.bam \
    -c input.bam \
    -f BAM \
    -g hs \
    -n h3k27me3 \
    --outdir peaks/ \
    --broad \
    --broad-cutoff 0.1 \
    -q 0.05

echo "Broad peaks: $(wc -l < peaks/h3k27me3_peaks.broadPeak)"

# H3K4me3 (sharp mark — use narrow peaks)
macs3 callpeak \
    -t h3k4me3.bam \
    -c input.bam \
    -f BAM \
    -g hs \
    -n h3k4me3 \
    --outdir peaks/ \
    -q 0.05
```

### Step 4: Call ATAC-seq Peaks

ATAC-seq requires special handling for the Tn5 insertion site.

```bash
# ATAC-seq with paired-end BAM (recommended)
macs3 callpeak \
    -t atac.bam \
    -f BAMPE \
    -g hs \
    -n atac_sample \
    --outdir peaks/ \
    --nomodel \
    --nolambda \
    -q 0.05 \
    --keep-dup all

echo "ATAC peaks: $(wc -l < peaks/atac_sample_peaks.narrowPeak)"

# Single-end ATAC-seq: shift reads to center on Tn5 cut site
macs3 callpeak \
    -t atac_se.bam \
    -f BAM \
    -g hs \
    -n atac_se \
    --outdir peaks/ \
    --nomodel \
    --shift -100 \
    --extsize 200 \
    --keep-dup all
```

### Step 5: Generate Signal Tracks for Visualization

Produce bedGraph and bigWig files for genome browser visualization.

```bash
# Generate bedGraph normalized to million reads (SPMR)
macs3 callpeak \
    -t chip.bam \
    -c input.bam \
    -f BAM \
    -g hs \
    -n chip_track \
    --outdir tracks/ \
    -B \
    --SPMR \
    --keep-dup auto

# Convert bedGraph to bigWig for IGV/UCSC
# Requires bedGraphToBigWig and chrom.sizes
sort -k1,1 -k2,2n tracks/chip_track_treat_pileup.bdg > tracks/chip_sorted.bdg
bedGraphToBigWig tracks/chip_sorted.bdg genome/hg38.chrom.sizes tracks/chip.bw

echo "BigWig track: tracks/chip.bw"
```

### Step 6: Annotate and Analyze Peaks

Parse narrowPeak output and annotate peaks to genomic features.

```python
import pandas as pd

# Load narrowPeak file
# Columns: chrom, start, end, name, score, strand, signalValue, pValue, qValue, peak
cols = ["chrom", "start", "end", "name", "score", "strand",
        "signalValue", "pValue", "qValue", "peak"]
peaks = pd.read_csv("peaks/tf_chip_peaks.narrowPeak", sep="\t",
                    header=None, names=cols)

print(f"Total peaks: {len(peaks)}")
print(f"Peaks on chr1: {(peaks['chrom'] == 'chr1').sum()}")
print(f"Median peak width: {(peaks['end'] - peaks['start']).median():.0f} bp")
print(f"Peaks with q-value < 0.01: {(peaks['qValue'] > 2).sum()}")  # -log10(q) > 2

# Filter high-confidence peaks
high_conf =

Del mismo repositorio

sciagent-skill-creatorSkill

opentrons-integrationSkill

Opentrons Protocol API v2 for OT-2/Flex: Python protocols for pipetting, serial dilutions, PCR, plate replication; control thermocycler, heater-shaker, magnetic, temperature modules. Use pylabrobot for multi-vendor.

plotly-interactive-visualizationSkill

Interactive visualization with Plotly. 40+ chart types (scatter, line, heatmap, 3D, geographic) with hover, zoom, pan. Two APIs: Plotly Express (DataFrame) and Graph Objects (fine control). For static publication figures use matplotlib; for statistical grammar use seaborn.

seaborn-statistical-visualizationSkill

Statistical visualization on matplotlib + pandas. Distributions (histplot, kdeplot, violin, box), relational (scatter, line), categorical, regression, correlation heatmaps. Auto aggregation/CIs. Use plotly for interactive; matplotlib for low-level.

single-cell-annotationSkill

Best practices for single-cell RNA-seq cell type annotation including marker-based, reference-based, and automated classification approaches.

pymc-bayesian-modelingSkill

Bayesian modeling with PyMC 5: priors, likelihood, NUTS/ADVI sampling, diagnostics (R-hat, ESS), LOO/WAIC comparison, prediction. Hierarchical, logistic, GP variants; predictive checks.

scikit-survival-analysisSkill

Time-to-event modeling with scikit-survival: Cox PH (elastic net), Random Survival Forests, Boosting, SVMs for censored data. C-index, Brier, time-dependent AUC; Kaplan-Meier, Nelson-Aalen, competing risks. Pipeline/GridSearchCV compatible. Use statsmodels for frequentist, pymc for Bayesian, lifelines for parametric.

statistical-analysisSkill