Skip to main content
ClaudeWave
Skill2.7k repo starsupdated 2mo ago

bio-atac-seq-footprinting

This skill detects transcription factor binding sites within accessible chromatin regions using TOBIAS, a bioinformatics tool that identifies "footprints" where bound factors protect DNA from Tn5 cleavage during ATAC-seq. Use it when analyzing ATAC-seq data to map TF occupancy patterns, compare TF binding between conditions, or identify active regulatory elements genome-wide through a three-step workflow of bias correction, footprint scoring, and motif-based binding detection.

Install in Claude Code
Copy
git clone --depth 1 https://github.com/FreedomIntelligence/OpenClaw-Medical-Skills /tmp/bio-atac-seq-footprinting && cp -r /tmp/bio-atac-seq-footprinting/skills/bio-atac-seq-footprinting ~/.claude/skills/bio-atac-seq-footprinting
Then start a new Claude Code session; the skill loads automatically.

SKILL.md

## Version Compatibility

Reference examples tested with: bedtools 2.31+, matplotlib 3.8+, numpy 1.26+, pandas 2.2+, pyBigWig 0.3+, samtools 1.19+

Before using code patterns, verify installed versions match. If versions differ:
- Python: `pip show <package>` then `help(module.function)` to check signatures
- R: `packageVersion('<pkg>')` then `?function_name` to verify parameters
- CLI: `<tool> --version` then `<tool> --help` to confirm flags

If code throws ImportError, AttributeError, or TypeError, introspect the installed
package and adapt the example to match the actual API rather than retrying.

# TF Footprinting

**"Identify TF binding footprints in my ATAC-seq data"** → Detect protected DNA regions within accessible chromatin where bound transcription factors block Tn5 insertion.
- CLI: `TOBIAS ATACorrect` → `TOBIAS FootprintScores` → `TOBIAS BINDetect`

## TOBIAS Workflow

**Goal:** Identify transcription factor binding footprints within accessible chromatin regions.

**Approach:** Correct Tn5 insertion bias, compute per-base footprint scores, then detect bound/unbound TF motif sites using the three-step TOBIAS pipeline.

```bash
# 1. Correct Tn5 bias
tobias ATACorrect \
    --bam sample.bam \
    --genome genome.fa \
    --peaks peaks.bed \
    --outdir corrected/ \
    --cores 8

# 2. Calculate footprint scores
tobias FootprintScores \
    --signal corrected/sample_corrected.bw \
    --regions peaks.bed \
    --output footprints.bw \
    --cores 8

# 3. Bind TF motifs
tobias BINDetect \
    --motifs JASPAR_motifs.pfm \
    --signals footprints.bw \
    --genome genome.fa \
    --peaks peaks.bed \
    --outdir bindetect_output/ \
    --cores 8
```

## TOBIAS Differential Footprinting

**Goal:** Compare TF binding between two conditions to identify regulators with differential activity.

**Approach:** Provide two bias-corrected signal tracks to BINDetect, which scores each motif site for differential binding between conditions.

```bash
# Compare conditions
tobias BINDetect \
    --motifs JASPAR_motifs.pfm \
    --signals condition1.bw condition2.bw \
    --genome genome.fa \
    --peaks consensus_peaks.bed \
    --outdir differential_footprints/ \
    --cond_names condition1 condition2 \
    --cores 8

# Output includes:
# - Differential binding scores
# - Per-TF statistics
# - Bound/unbound site predictions
```

## Download JASPAR Motifs

```bash
# Download JASPAR motifs
wget https://jaspar.genereg.net/download/data/2022/CORE/JASPAR2022_CORE_vertebrates_non-redundant_pfms_jaspar.txt
mv JASPAR2022_CORE_vertebrates_non-redundant_pfms_jaspar.txt JASPAR_motifs.pfm
```

## Prepare Input Files

```bash
# Ensure BAM is sorted and indexed
samtools sort -@ 8 sample.bam -o sample.sorted.bam
samtools index sample.sorted.bam

# Filter peaks (remove blacklist, size filter)
bedtools intersect -v -a peaks.narrowPeak -b blacklist.bed | \
    awk '$3-$2 >= 100 && $3-$2 <= 5000' > filtered_peaks.bed
```

## HINT-ATAC Alternative

```bash
# RGT suite HINT-ATAC
rgt-hint footprinting \
    --atac-seq \
    --organism hg38 \
    --output-prefix sample \
    sample.bam peaks.bed
```

## PIQ Footprinting

```r
# PIQ (another footprinting tool)
library(PIQ)

# Load data
bam <- 'sample.bam'
pwms <- readMotifs('JASPAR_motifs.pfm')

# Run footprinting
piq_results <- piq(bam, pwms, genome='hg38')
```

## Aggregate Footprint Plots

```bash
# TOBIAS PlotAggregate
tobias PlotAggregate \
    --TFBS bindetect_output/*/beds/*_bound.bed \
    --signals corrected/sample_corrected.bw \
    --output aggregate_footprints.pdf \
    --share_y \
    --plot_boundaries
```

## Python: Custom Footprint Analysis

**Goal:** Extract and visualize aggregate ATAC-seq signal around predicted TF binding sites.

**Approach:** Sample bigWig signal values in windows centered on motif sites, average across all sites, and plot the characteristic V-shaped footprint.

```python
import pyBigWig
import numpy as np
import pandas as pd
from pyfaidx import Fasta

def extract_footprint_signal(bigwig_file, bed_file, flank=100):
    '''Extract signal around binding sites.'''
    bw = pyBigWig.open(bigwig_file)

    signals = []
    for line in open(bed_file):
        fields = line.strip().split('\t')
        chrom, start, end = fields[0], int(fields[1]), int(fields[2])
        center = (start + end) // 2

        try:
            vals = bw.values(chrom, center - flank, center + flank)
            if vals:
                signals.append(vals)
        except:
            continue

    avg_signal = np.nanmean(signals, axis=0)
    return avg_signal

def plot_footprint(signal, output_file):
    '''Plot aggregate footprint.'''
    import matplotlib.pyplot as plt

    x = np.arange(-len(signal)//2, len(signal)//2)

    plt.figure(figsize=(8, 4))
    plt.plot(x, signal, 'b-', linewidth=2)
    plt.axvline(0, color='red', linestyle='--', alpha=0.5)
    plt.xlabel('Distance from motif center (bp)')
    plt.ylabel('ATAC-seq signal')
    plt.title('Aggregate Footprint')
    plt.savefig(output_file, dpi=150)
    plt.close()
```

## Scan for Motifs

```bash
# Find motif occurrences in peaks
# Using FIMO (MEME suite)
fimo --oc fimo_output motifs.meme peaks.fa

# Or HOMER
findMotifsGenome.pl peaks.bed hg38 motif_analysis/ -find motif.motif
```

## Interpret Footprint Depth

| Footprint Depth | Interpretation |
|-----------------|----------------|
| Deep footprint | Strong TF binding |
| Shallow footprint | Weak/transient binding |
| No footprint | No binding or wrong motif |
| Shoulders only | Nucleosome positioning |

## Quality Considerations

```bash
# Footprinting requires:
# - High read depth (>50M reads)
# - NFR-enriched signal (filter for <100bp fragments)
# - Good Tn5 bias correction

# Extract NFR reads
samtools view -h sample.bam | \
    awk 'substr($0,1,1)=="@" || ($9>0 && $9<100) || ($9<0 && $9>-100)' | \
    samtools view -b > nfr.bam
```

## Differential TF Activity

```python
def compare_footprints(tf_name, cond1_bw, cond2_bw, motif_bed):
    ''
aav-vector-design-agentSkill
adaptyvSkill

Cloud laboratory platform for automated protein testing and validation. Use when designing proteins and needing experimental validation including binding assays, expression testing, thermostability measurements, enzyme activity assays, or protein sequence optimization. Also use for submitting experiments via API, tracking experiment status, downloading results, optimizing protein sequences for better expression using computational tools (NetSolP, SoluProt, SolubleMPNN, ESM), or managing protein design workflows with wet-lab validation.

adhd-daily-plannerSkill

Time-blind friendly planning, executive function support, and daily structure for ADHD brains. Specializes in realistic time estimation, dopamine-aware task design, and building systems that

aeonSkill

This skill should be used for time series machine learning tasks including classification, regression, clustering, forecasting, anomaly detection, segmentation, and similarity search. Use when working with temporal data, sequential patterns, or time-indexed observations requiring specialized algorithms beyond standard ML approaches. Particularly suited for univariate and multivariate time series analysis with scikit-learn compatible APIs.

agent-browserSkill

Browse the web for any task — research topics, read articles, interact with web apps, fill forms, take screenshots, extract data, and test web pages. Use whenever a browser would be useful, not just when the user explicitly asks.

agentd-drug-discoverySkill
ai-analyzerSkill

AI驱动的综合健康分析系统,整合多维度健康数据、识别异常模式、预测健康风险、提供个性化建议。支持智能问答和AI健康报告生成。

alphafold-databaseSkill

Access AlphaFold's 200M+ AI-predicted protein structures. Retrieve structures by UniProt ID, download PDB/mmCIF files, analyze confidence metrics (pLDDT, PAE), for drug discovery and structural biology.