Skill2.9k repo starsupdated 7d ago

bio-atac-seq-nucleosome-positioning

This Claude Code skill extracts nucleosome positions and occupancy patterns from ATAC-seq data by analyzing fragment size distributions and applying nucleosome-positioning algorithms. It is used when researchers need to map nucleosome-free regions at promoters, characterize chromatin organization around transcription start sites, or identify nucleosome occupancy patterns from ATAC-seq experiments using tools like NucleoATAC and ATACseqQC.

View source Repository: OpenClaw-Medical-Skills

Install in Claude Code

Copy

git clone --depth 1 https://github.com/FreedomIntelligence/OpenClaw-Medical-Skills /tmp/bio-atac-seq-nucleosome-positioning && cp -r /tmp/bio-atac-seq-nucleosome-positioning/skills/bio-atac-seq-nucleosome-positioning ~/.claude/skills/bio-atac-seq-nucleosome-positioning

Then start a new Claude Code session; the skill loads automatically.

Definition

SKILL.md

## Version Compatibility

Reference examples tested with: Rsamtools 2.18+, matplotlib 3.8+, numpy 1.26+, pyBigWig 0.3+, pysam 0.22+, samtools 1.19+

Before using code patterns, verify installed versions match. If versions differ:
- Python: `pip show <package>` then `help(module.function)` to check signatures
- R: `packageVersion('<pkg>')` then `?function_name` to verify parameters
- CLI: `<tool> --version` then `<tool> --help` to confirm flags

If code throws ImportError, AttributeError, or TypeError, introspect the installed
package and adapt the example to match the actual API rather than retrying.

# Nucleosome Positioning

**"Map nucleosome positions from ATAC-seq"** → Separate nucleosome-free and mono-nucleosome fragments by size, then call nucleosome center positions and occupancy scores.
- CLI: `nucleoatac run --bed peaks.bed --bam atac.bam --fasta ref.fa`
- R: `ATACseqQC::splitGAlignmentsByCut()` for fragment separation

Extract nucleosome positions and occupancy from ATAC-seq fragment size patterns.

## Background

ATAC-seq fragments reflect chromatin structure:
- **< 100 bp**: Nucleosome-free regions (NFR)
- **180-247 bp**: Mono-nucleosome
- **315-473 bp**: Di-nucleosome
- **558-615 bp**: Tri-nucleosome

## ATACseqQC (R)

### Installation

```r
BiocManager::install('ATACseqQC')
```

### Fragment Size Distribution

```r
library(ATACseqQC)
library(Rsamtools)

# Read BAM
bamfile <- 'sample.bam'

# Fragment size distribution
fragSize <- fragSizeDist(bamfile, 'sample')

# Nucleosome-free and mono-nucleosome ratios
# Automatic QC metrics
```

### Nucleosome Positioning

**Goal:** Map nucleosome positions around TSS using ATAC-seq fragment size classes.

**Approach:** Read BAM, apply Tn5 shift correction, split fragments into NFR and mono-nucleosome classes by size, then compute signal profiles around TSS.

```r
library(ATACseqQC)
library(TxDb.Hsapiens.UCSC.hg38.knownGene)
library(BSgenome.Hsapiens.UCSC.hg38)

# Get TSS regions
txs <- transcripts(TxDb.Hsapiens.UCSC.hg38.knownGene)
tss <- promoters(txs, upstream=1000, downstream=1000)

# Read BAM
gal <- readBamFile(bamfile, asMates=TRUE, bigFile=TRUE)

# Shift reads (Tn5 offset correction)
gal_shifted <- shiftGAlignmentsList(gal)

# Split by nucleosome-free and nucleosomal
objs <- splitGAlignmentsByCut(gal_shifted, txs=txs,
                               genome=BSgenome.Hsapiens.UCSC.hg38)

# nucleosome-free fragments
nfr <- objs$NussomeFree

# Mono-nucleosome fragments
mono <- objs$mononucleosome

# Signal around TSS
sigs <- featureAlignedSignal(cvglist=objs,
                             feature.gr=tss,
                             upstream=1000,
                             downstream=1000)
```

### V-Plot (Fragment Size vs Position)

```r
# V-plot showing nucleosome positioning around TSS
vp <- vPlot(gal_shifted, tss,
            genome=BSgenome.Hsapiens.UCSC.hg38,
            upstream=1000, downstream=1000)
```

### Footprinting

```r
# Transcription factor footprinting
library(MotifDb)

# Get motif
motif <- query(MotifDb, 'CTCF')[[1]]

# Find motif occurrences
library(motifmatchr)
motif_pos <- matchMotifs(motif, BSgenome.Hsapiens.UCSC.hg38,
                         genome='hg38', out='positions')

# Calculate footprint
fp <- factorFootprints(gal_shifted, motif_pos,
                       genome=BSgenome.Hsapiens.UCSC.hg38,
                       upstream=100, downstream=100)
```

## NucleoATAC (Python)

### Installation

```bash
pip install nucleoatac
```

### Run NucleoATAC

**Goal:** Call precise nucleosome center positions and occupancy scores from ATAC-seq data.

**Approach:** Run NucleoATAC on defined genomic regions with a reference genome, producing nucleosome position calls and occupancy tracks.

```bash
# Call nucleosomes
nucleoatac run --bed regions.bed --bam sample.bam --fasta reference.fa \
    --out nucleoatac_output --cores 8
```

### Output Files

| File | Description |
|------|-------------|
| `.nucpos.bed` | Nucleosome positions |
| `.nucpos.redundant.bed` | All nucleosome calls |
| `.nfrpos.bed` | NFR positions |
| `.occ.bedgraph` | Nucleosome occupancy track |
| `.nucmap_combined.bed` | Combined nucleosome map |

### Visualize Output

```bash
# Convert to bigWig for visualization
bedGraphToBigWig nucleoatac_output.occ.bedgraph chrom.sizes nucleosome_occ.bw
```

## Fragment Analysis (Custom)

### Extract Fragment Sizes

**Goal:** Visualize ATAC-seq fragment size distribution to assess nucleosome periodicity.

**Approach:** Extract template lengths from properly paired reads, then plot the histogram with NFR and mono-nucleosome cutoff markers.

```python
import pysam
import numpy as np
import matplotlib.pyplot as plt

bam = pysam.AlignmentFile('sample.bam', 'rb')

fragment_sizes = []
for read in bam.fetch():
    if read.is_proper_pair and read.is_read1:
        frag_size = abs(read.template_length)
        if 0 < frag_size < 1000:
            fragment_sizes.append(frag_size)

bam.close()

# Plot distribution
plt.figure(figsize=(10, 6))
plt.hist(fragment_sizes, bins=200, edgecolor='none', alpha=0.7)
plt.axvline(100, color='red', linestyle='--', label='NFR cutoff')
plt.axvline(180, color='blue', linestyle='--', label='Mono-nuc start')
plt.xlabel('Fragment Size (bp)')
plt.ylabel('Count')
plt.legend()
plt.savefig('fragment_distribution.png', dpi=300)
```

### Split by Fragment Size

```bash
# Extract nucleosome-free reads
samtools view -h sample.bam | \
    awk '$9 > -100 && $9 < 100 || $1 ~ /^@/' | \
    samtools view -b > nfr.bam

# Extract mono-nucleosome reads
samtools view -h sample.bam | \
    awk '($9 >= 180 && $9 <= 247) || ($9 <= -180 && $9 >= -247) || $1 ~ /^@/' | \
    samtools view -b > mono_nuc.bam
```

### Signal Around Features

```python
import pysam
import numpy as np
import pyBigWig

def signal_around_sites(bam_file, sites, upstream=1000, downstream=1000):
    bam = pysam.AlignmentFile(bam_file, 'rb')
    window_size = upstream + downstream
    signal = np.zeros(window_size)

    for chrom, pos, str

More from this repository

aav-vector-design-agentSkill

adaptyvSkill

Cloud laboratory platform for automated protein testing and validation. Use when designing proteins and needing experimental validation including binding assays, expression testing, thermostability measurements, enzyme activity assays, or protein sequence optimization. Also use for submitting experiments via API, tracking experiment status, downloading results, optimizing protein sequences for better expression using computational tools (NetSolP, SoluProt, SolubleMPNN, ESM), or managing protein design workflows with wet-lab validation.

adhd-daily-plannerSkill

Time-blind friendly planning, executive function support, and daily structure for ADHD brains. Specializes in realistic time estimation, dopamine-aware task design, and building systems that

aeonSkill

This skill should be used for time series machine learning tasks including classification, regression, clustering, forecasting, anomaly detection, segmentation, and similarity search. Use when working with temporal data, sequential patterns, or time-indexed observations requiring specialized algorithms beyond standard ML approaches. Particularly suited for univariate and multivariate time series analysis with scikit-learn compatible APIs.

agent-browserSkill

Browse the web for any task — research topics, read articles, interact with web apps, fill forms, take screenshots, extract data, and test web pages. Use whenever a browser would be useful, not just when the user explicitly asks.

agentd-drug-discoverySkill

ai-analyzerSkill

AI驱动的综合健康分析系统，整合多维度健康数据、识别异常模式、预测健康风险、提供个性化建议。支持智能问答和AI健康报告生成。

alphafold-databaseSkill

Access AlphaFold's 200M+ AI-predicted protein structures. Retrieve structures by UniProt ID, download PDB/mmCIF files, analyze confidence metrics (pLDDT, PAE), for drug discovery and structural biology.