Skip to main content
ClaudeWave
Skill199 repo starsupdated 16d ago

deeptools-ngs-analysis

NGS CLI for ChIP/RNA/ATAC-seq. BAM→bigWig with RPGC/CPM/RPKM, sample correlation/PCA, heatmaps/profiles around features, fingerprints. For alignment use STAR/BWA; for peak calling use MACS2.

Install in Claude Code
Copy
git clone --depth 1 https://github.com/jaechang-hits/SciAgent-Skills /tmp/deeptools-ngs-analysis && cp -r /tmp/deeptools-ngs-analysis/skills/genomics-bioinformatics/interval-ops/deeptools-ngs-analysis ~/.claude/skills/deeptools-ngs-analysis
Then start a new Claude Code session; the skill loads automatically.

SKILL.md

# deepTools — NGS Data Analysis Toolkit

## Overview

deepTools is a command-line toolkit for processing and visualizing high-throughput sequencing data. It converts BAM alignments to normalized coverage tracks (bigWig), performs quality control (correlation, PCA, fingerprint), and generates publication-quality heatmaps and profile plots around genomic features. Supports ChIP-seq, RNA-seq, ATAC-seq, and MNase-seq.

## When to Use

- Converting BAM files to normalized bigWig coverage tracks
- Comparing ChIP-seq treatment vs input control (log2 ratio tracks)
- Assessing sample quality: replicate correlation, PCA, coverage depth
- Evaluating ChIP enrichment strength (fingerprint plots)
- Creating heatmaps and profile plots around TSS, peaks, or other genomic regions
- Analyzing ATAC-seq data with Tn5 offset correction
- Generating strand-specific RNA-seq coverage tracks
- For **read alignment**, use STAR, BWA, or bowtie2 instead
- For **peak calling**, use MACS2 or HOMER instead
- For **BAM/VCF file manipulation**, use pysam instead

## Prerequisites

```bash
pip install deeptools
# Verify installation
bamCoverage --version
```

**Input requirements**: BAM files must be sorted and indexed (`.bai` file present). Generate index with `samtools index input.bam`. BED files for genomic regions (genes, peaks) in standard 3+ column format.

## Quick Start

```bash
# Convert BAM to normalized bigWig
bamCoverage --bam sample.bam --outFileName sample.bw \
    --normalizeUsing RPGC --effectiveGenomeSize 2913022398 \
    --binSize 10 --numberOfProcessors 8

# Create heatmap around TSS
computeMatrix reference-point -S sample.bw -R genes.bed \
    -b 3000 -a 3000 --referencePoint TSS -o matrix.gz
plotHeatmap -m matrix.gz -o heatmap.png --colorMap RdBu
```

## Core API

### 1. BAM to Coverage Conversion

Convert BAM alignments to normalized coverage tracks (bigWig or bedGraph).

```bash
# Basic conversion with RPGC normalization
bamCoverage --bam input.bam --outFileName output.bw \
    --normalizeUsing RPGC --effectiveGenomeSize 2913022398 \
    --binSize 10 --numberOfProcessors 8 \
    --extendReads 200 --ignoreDuplicates

# CPM normalization (simpler, no genome size needed)
bamCoverage --bam input.bam --outFileName output.bw \
    --normalizeUsing CPM --binSize 10 -p 8

# RNA-seq: strand-specific coverage
bamCoverage --bam rnaseq.bam --outFileName forward.bw \
    --filterRNAstrand forward --normalizeUsing CPM -p 8
# IMPORTANT: Never use --extendReads for RNA-seq (spans splice junctions)
```

### 2. Sample Comparison

Compare treatment vs control or generate ratio tracks.

```bash
# Log2 ratio: treatment / control
bamCompare -b1 treatment.bam -b2 control.bam -o log2ratio.bw \
    --operation log2 --scaleFactorsMethod readCount \
    --extendReads 200 -p 8

# Subtract control from treatment
bamCompare -b1 treatment.bam -b2 control.bam -o subtract.bw \
    --operation subtract --scaleFactorsMethod readCount
```

### 3. Quality Control

Assess sample quality, replicate concordance, and enrichment strength.

```bash
# Sample correlation heatmap
multiBamSummary bins --bamfiles rep1.bam rep2.bam rep3.bam \
    -o counts.npz --binSize 10000 -p 8
plotCorrelation -in counts.npz --corMethod pearson \
    --whatToShow heatmap -o correlation.png
# Good: replicates cluster with r > 0.9

# PCA of samples
plotPCA -in counts.npz -o pca.png --plotTitle "Sample PCA"

# ChIP enrichment fingerprint
plotFingerprint -b input.bam chip.bam -o fingerprint.png \
    --extendReads 200 --ignoreDuplicates
# Good ChIP: steep rise curve; flat diagonal = poor enrichment

# Coverage depth assessment
plotCoverage -b sample.bam -o coverage.png --ignoreDuplicates -p 8

# Fragment size distribution (paired-end)
bamPEFragmentSize -b sample.bam -o fragsize.png
```

### 4. Heatmaps and Profile Plots

Visualize signal around genomic features (TSS, peaks, gene bodies).

```bash
# Reference-point mode: signal around TSS
computeMatrix reference-point -S chip.bw -R genes.bed \
    -b 3000 -a 3000 --referencePoint TSS -o matrix.gz -p 8

# Scale-regions mode: signal across gene bodies
computeMatrix scale-regions -S chip.bw -R genes.bed \
    -b 1000 -a 1000 --regionBodyLength 5000 -o matrix.gz -p 8

# Generate heatmap
plotHeatmap -m matrix.gz -o heatmap.png \
    --colorMap RdBu --kmeans 3 --sortUsing mean

# Generate profile plot
plotProfile -m matrix.gz -o profile.png \
    --plotType lines --colors blue red

# Multiple signal files: compare marks
computeMatrix reference-point -S h3k4me3.bw h3k27me3.bw -R genes.bed \
    -b 3000 -a 3000 --referencePoint TSS -o multi_matrix.gz
plotHeatmap -m multi_matrix.gz -o multi_heatmap.png
```

### 5. Read Filtering and Processing

Filter reads before analysis or correct for assay-specific biases.

```bash
# Filter by mapping quality and fragment size
alignmentSieve --bam input.bam --outFile filtered.bam \
    --minMappingQuality 10 --minFragmentLength 150 \
    --maxFragmentLength 700

# ATAC-seq: apply Tn5 offset correction (+4/-5 bp shift)
alignmentSieve --bam atac.bam --outFile shifted.bam --ATACshift
# Then index: samtools index shifted.bam

# GC bias correction (only if significant bias detected)
computeGCBias -b input.bam --effectiveGenomeSize 2913022398 \
    -g genome.2bit --GCbiasFrequenciesFile gc_freq.txt -p 8
correctGCBias -b input.bam --effectiveGenomeSize 2913022398 \
    --GCbiasFrequenciesFile gc_freq.txt -o corrected.bam
```

### 6. Enrichment Analysis

Quantify signal enrichment at specific regions.

```bash
# Signal enrichment at peak regions
plotEnrichment -b chip.bam input.bam --BED peaks.bed \
    -o enrichment.png --ignoreDuplicates -p 8
```

## Key Concepts

### Normalization Methods

| Method | Formula | When to Use | Requires |
|--------|---------|-------------|----------|
| **RPGC** | 1× genome coverage | ChIP-seq, ATAC-seq | `--effectiveGenomeSize` |
| **CPM** | Counts per million | Any assay, quick comparison | Nothing |
| **RPKM** | Per kb per million | RNA-seq gene-level | Nothi
sciagent-skill-creatorSkill

|

opentrons-integrationSkill

Opentrons Protocol API v2 for OT-2/Flex: Python protocols for pipetting, serial dilutions, PCR, plate replication; control thermocycler, heater-shaker, magnetic, temperature modules. Use pylabrobot for multi-vendor.

plotly-interactive-visualizationSkill

Interactive visualization with Plotly. 40+ chart types (scatter, line, heatmap, 3D, geographic) with hover, zoom, pan. Two APIs: Plotly Express (DataFrame) and Graph Objects (fine control). For static publication figures use matplotlib; for statistical grammar use seaborn.

seaborn-statistical-visualizationSkill

Statistical visualization on matplotlib + pandas. Distributions (histplot, kdeplot, violin, box), relational (scatter, line), categorical, regression, correlation heatmaps. Auto aggregation/CIs. Use plotly for interactive; matplotlib for low-level.

single-cell-annotationSkill

Best practices for single-cell RNA-seq cell type annotation including marker-based, reference-based, and automated classification approaches.

pymc-bayesian-modelingSkill

Bayesian modeling with PyMC 5: priors, likelihood, NUTS/ADVI sampling, diagnostics (R-hat, ESS), LOO/WAIC comparison, prediction. Hierarchical, logistic, GP variants; predictive checks.

scikit-survival-analysisSkill

Time-to-event modeling with scikit-survival: Cox PH (elastic net), Random Survival Forests, Boosting, SVMs for censored data. C-index, Brier, time-dependent AUC; Kaplan-Meier, Nelson-Aalen, competing risks. Pipeline/GridSearchCV compatible. Use statsmodels for frequentist, pymc for Bayesian, lifelines for parametric.

statistical-analysisSkill

>-