Skip to main content
ClaudeWave
Skill2.7k repo starsupdated 2mo ago

bio-atac-seq-motif-deviation

This Claude Code skill provides an R-based workflow for analyzing transcription factor motif accessibility variability in ATAC-seq data using chromVAR. It computes per-sample deviation scores to identify which TF motifs show differential accessibility across conditions, utilizing peak counts, GC bias correction, motif matching against JASPAR databases, and statistical z-score computation to reveal regulators driving chromatin state differences.

Install in Claude Code
Copy
git clone --depth 1 https://github.com/FreedomIntelligence/OpenClaw-Medical-Skills /tmp/bio-atac-seq-motif-deviation && cp -r /tmp/bio-atac-seq-motif-deviation/skills/bio-atac-seq-motif-deviation ~/.claude/skills/bio-atac-seq-motif-deviation
Then start a new Claude Code session; the skill loads automatically.

SKILL.md

## Version Compatibility

Reference examples tested with: ggplot2 3.5+, limma 3.58+

Before using code patterns, verify installed versions match. If versions differ:
- R: `packageVersion('<pkg>')` then `?function_name` to verify parameters

If code throws ImportError, AttributeError, or TypeError, introspect the installed
package and adapt the example to match the actual API rather than retrying.

# Motif Deviation Analysis

**"Which TF motifs show variable accessibility across my samples?"** → Compute per-sample deviation scores for TF motif accessibility to identify regulators driving chromatin state differences.
- R: `chromVAR::computeDeviations(counts, motifs)`

Measure per-sample variability in transcription factor motif accessibility using chromVAR. This identifies TFs whose binding sites show differential accessibility across conditions.

## Required Packages

```r
library(chromVAR)
library(motifmatchr)
library(BSgenome.Hsapiens.UCSC.hg38)  # or appropriate genome
library(JASPAR2020)
library(TFBSTools)
library(SummarizedExperiment)
```

## Basic Workflow

**Goal:** Run chromVAR to compute per-sample TF motif deviation scores from ATAC-seq peak counts.

**Approach:** Load peak counts into a SummarizedExperiment, correct for GC bias, filter low-quality peaks, match JASPAR motifs, and compute deviation z-scores.

### 1. Load Peak Counts

```r
library(chromVAR)
library(SummarizedExperiment)

# From count matrix and peak ranges
peaks <- read.table('peaks.bed', col.names = c('chr', 'start', 'end'))
peak_ranges <- GRanges(seqnames = peaks$chr, ranges = IRanges(peaks$start, peaks$end))

counts <- read.table('counts.txt', header = TRUE, row.names = 1)
counts_matrix <- as.matrix(counts)

fragment_counts <- SummarizedExperiment(
    assays = list(counts = counts_matrix),
    rowRanges = peak_ranges
)
```

### 2. Add GC Bias Correction

```r
library(BSgenome.Hsapiens.UCSC.hg38)

fragment_counts <- addGCBias(fragment_counts, genome = BSgenome.Hsapiens.UCSC.hg38)
```

### 3. Filter Low-Quality Peaks

```r
# min_depth=1500: Minimum total reads per sample. Adjust based on library size.
# min_in_peaks=0.15: Minimum fraction of reads in peaks (FRiP). 0.15 = 15%.
fragment_counts <- filterSamples(fragment_counts, min_depth = 1500, min_in_peaks = 0.15)

# min_count=10: Require peaks with >=10 reads across samples.
# n_samples_frac=0.1: Peak must be detected in >=10% of samples.
fragment_counts <- filterPeaks(fragment_counts, non_overlapping = TRUE,
                                min_count = 10, n_samples_frac = 0.1)
```

## Get Motif Annotations

### From JASPAR

```r
library(JASPAR2020)
library(TFBSTools)
library(motifmatchr)

# Get vertebrate motifs from JASPAR
pfm <- getMatrixSet(JASPAR2020, opts = list(collection = 'CORE', tax_group = 'vertebrates'))

# Match motifs to peaks
# p.cutoff=5e-5: Motif match p-value threshold. Lower = more stringent.
motif_ix <- matchMotifs(pfm, fragment_counts, genome = BSgenome.Hsapiens.UCSC.hg38, p.cutoff = 5e-5)
```

### From CIS-BP or Custom PWMs

```r
# Load custom motifs from file
library(universalmotif)
motifs <- read_meme('custom_motifs.meme')
pfm_list <- lapply(motifs, function(m) convert_motifs(m, class = 'TFBSTools-PFMatrix'))

motif_ix <- matchMotifs(pfm_list, fragment_counts, genome = BSgenome.Hsapiens.UCSC.hg38)
```

## Compute Deviations

```r
# Compute chromVAR deviation scores
dev <- computeDeviations(object = fragment_counts, annotations = motif_ix)

# Extract deviation scores (z-scores)
deviation_scores <- deviations(dev)

# Extract variability across samples
variability <- computeVariability(dev)
```

## Interpreting Results

### Deviation Scores

```r
# Deviation z-scores: positive = more accessible than expected
# Compare across samples
dev_matrix <- deviations(dev)
print(dim(dev_matrix))  # motifs x samples

# Get top variable motifs
var_df <- variability
var_df <- var_df[order(-var_df$variability), ]
head(var_df, 20)
```

### Variability Interpretation

| Variability | Interpretation |
|-------------|----------------|
| > 2.0 | Highly variable across samples |
| 1.0 - 2.0 | Moderately variable |
| < 1.0 | Low variability |

## Visualization

### Deviation Heatmap

```r
library(pheatmap)

# Get top variable motifs
# n_top=50: Number of top variable motifs to display.
n_top <- 50
top_motifs <- head(rownames(var_df), n_top)
top_dev <- deviation_scores[top_motifs, ]

# Add sample annotations
sample_info <- data.frame(
    Condition = colData(fragment_counts)$condition,
    row.names = colnames(top_dev)
)

pheatmap(top_dev, annotation_col = sample_info, scale = 'row',
         clustering_method = 'ward.D2', show_rownames = TRUE)
```

### Variability Plot

```r
plotVariability(variability, use_plotly = FALSE)
```

### PCA of Deviation Scores

```r
library(ggplot2)

# PCA on deviation scores
pca <- prcomp(t(deviation_scores), scale. = TRUE)
pca_df <- data.frame(PC1 = pca$x[,1], PC2 = pca$x[,2],
                     Condition = colData(fragment_counts)$condition)

ggplot(pca_df, aes(x = PC1, y = PC2, color = Condition)) +
    geom_point(size = 3) +
    theme_minimal() +
    labs(title = 'PCA of chromVAR Deviations')
```

## Differential Motif Accessibility

**Goal:** Identify TF motifs with significantly different accessibility between experimental groups.

**Approach:** Fit a linear model (limma) to deviation z-scores across groups and extract significant motifs with empirical Bayes moderation.

### Compare Two Groups

```r
library(limma)

# Get sample groups
groups <- factor(colData(fragment_counts)$condition)

# Design matrix
design <- model.matrix(~ groups)

# Fit linear model to deviation scores
fit <- lmFit(deviation_scores, design)
fit <- eBayes(fit)

# Get differential motifs
# p.value=0.05: FDR threshold for significance.
diff_motifs <- topTable(fit, coef = 2, number = Inf, p.value = 0.05)
print(head(diff_motifs, 20))
```

### Volcano Plot

```r
library(ggplot2)

all_results <- topTable(fit, coef = 2, number = Inf)
all_results$significant <-
aav-vector-design-agentSkill
adaptyvSkill

Cloud laboratory platform for automated protein testing and validation. Use when designing proteins and needing experimental validation including binding assays, expression testing, thermostability measurements, enzyme activity assays, or protein sequence optimization. Also use for submitting experiments via API, tracking experiment status, downloading results, optimizing protein sequences for better expression using computational tools (NetSolP, SoluProt, SolubleMPNN, ESM), or managing protein design workflows with wet-lab validation.

adhd-daily-plannerSkill

Time-blind friendly planning, executive function support, and daily structure for ADHD brains. Specializes in realistic time estimation, dopamine-aware task design, and building systems that

aeonSkill

This skill should be used for time series machine learning tasks including classification, regression, clustering, forecasting, anomaly detection, segmentation, and similarity search. Use when working with temporal data, sequential patterns, or time-indexed observations requiring specialized algorithms beyond standard ML approaches. Particularly suited for univariate and multivariate time series analysis with scikit-learn compatible APIs.

agent-browserSkill

Browse the web for any task — research topics, read articles, interact with web apps, fill forms, take screenshots, extract data, and test web pages. Use whenever a browser would be useful, not just when the user explicitly asks.

agentd-drug-discoverySkill
ai-analyzerSkill

AI驱动的综合健康分析系统,整合多维度健康数据、识别异常模式、预测健康风险、提供个性化建议。支持智能问答和AI健康报告生成。

alphafold-databaseSkill

Access AlphaFold's 200M+ AI-predicted protein structures. Retrieve structures by UniProt ID, download PDB/mmCIF files, analyze confidence metrics (pLDDT, PAE), for drug discovery and structural biology.