Skill2.9k estrellas del repoactualizado 8d ago

bio-chipseq-differential-binding

This DiffBind-based skill performs differential binding analysis on ChIP-seq data by comparing transcription factor or histone mark occupancy between experimental conditions. It creates a sample sheet linking BAM and peak files to conditions, counts reads at consensus peak regions, establishes contrasts between groups, and produces statistically significant differentially bound regions with fold changes and p-values. Use this when analyzing ChIP-seq experiments with replicate samples across multiple conditions to identify genomic regions with significant binding differences.

Ver fuente Repositorio: OpenClaw-Medical-Skills

Instalar en Claude Code

Copiar

git clone --depth 1 https://github.com/FreedomIntelligence/OpenClaw-Medical-Skills /tmp/bio-chipseq-differential-binding && cp -r /tmp/bio-chipseq-differential-binding/skills/bio-chipseq-differential-binding ~/.claude/skills/bio-chipseq-differential-binding

Después abre una sesión nueva de Claude Code; el skill carga automáticamente.

Definición

SKILL.md

## Version Compatibility

Reference examples tested with: DESeq2 1.42+, edgeR 4.0+

Before using code patterns, verify installed versions match. If versions differ:
- R: `packageVersion('<pkg>')` then `?function_name` to verify parameters

If code throws ImportError, AttributeError, or TypeError, introspect the installed
package and adapt the example to match the actual API rather than retrying.

# Differential Binding with DiffBind

**"Compare ChIP-seq binding between conditions"** → Identify genomic regions with statistically significant differences in transcription factor or histone mark occupancy between experimental groups.
- R: `DiffBind::dba()` → `dba.count()` → `dba.contrast()` → `dba.analyze()`

## Create Sample Sheet

**Goal:** Define the experimental design linking BAM files, peak files, and sample metadata for DiffBind.

**Approach:** Build a data frame (or CSV) with required columns mapping each sample to its files and conditions.

```r
# Create sample sheet as data frame or CSV
samples <- data.frame(
    SampleID = c('ctrl_1', 'ctrl_2', 'treat_1', 'treat_2'),
    Tissue = c('cell', 'cell', 'cell', 'cell'),
    Factor = c('H3K4me3', 'H3K4me3', 'H3K4me3', 'H3K4me3'),
    Condition = c('control', 'control', 'treatment', 'treatment'),
    Replicate = c(1, 2, 1, 2),
    bamReads = c('ctrl1.bam', 'ctrl2.bam', 'treat1.bam', 'treat2.bam'),
    Peaks = c('ctrl1_peaks.narrowPeak', 'ctrl2_peaks.narrowPeak',
              'treat1_peaks.narrowPeak', 'treat2_peaks.narrowPeak'),
    PeakCaller = c('macs', 'macs', 'macs', 'macs')
)

write.csv(samples, 'samples.csv', row.names = FALSE)
```

## Load Data

**Goal:** Initialize a DiffBind object from the sample sheet containing all samples and peaks.

**Approach:** Read the sample sheet CSV into a DBA object that identifies overlapping peaks across samples.

```r
library(DiffBind)

# From sample sheet
dba_obj <- dba(sampleSheet = 'samples.csv')

# View summary
dba_obj
```

## Count Reads in Peaks

**Goal:** Quantify read coverage at consensus peak regions across all samples.

**Approach:** Count reads in summit-centered windows using dba.count, creating a count matrix for statistical testing.

```r
# Count reads in consensus peaks
# summits=250 and bUseSummarizeOverlaps=TRUE are now defaults
dba_obj <- dba.count(dba_obj)

# With specific parameters
dba_obj <- dba.count(
    dba_obj,
    summits = 250,         # Re-center peaks around summits (default in 3.0)
    minOverlap = 2         # Peak must be in at least 2 samples
)
```

## Normalize Data

**Goal:** Apply normalization to account for library size and composition differences between samples.

**Approach:** Use dba.normalize which applies DESeq2/edgeR normalization factors to the count matrix.

```r
# Normalize (required before analysis)
dba_obj <- dba.normalize(dba_obj)

# Check normalization
dba.normalize(dba_obj, bRetrieve = TRUE)
```

## Set Up Contrast

**Goal:** Define the comparison between experimental conditions for differential testing.

**Approach:** Specify a design formula or category-based contrast that tells DiffBind which groups to compare.

```r
# Recommended: design formula approach
dba_obj <- dba.contrast(dba_obj, design = '~ Condition')

# Or use categories for automatic contrast
dba_obj <- dba.contrast(dba_obj, categories = DBA_CONDITION)

# Legacy approach (retained for backward compatibility, not recommended)
# dba_obj <- dba.contrast(dba_obj, group1 = dba_obj$masks$control,
#                         group2 = dba_obj$masks$treatment)
```

## Run Differential Analysis

**Goal:** Identify peaks with statistically significant binding differences between conditions.

**Approach:** Apply DESeq2 or edgeR negative binomial models to the normalized count matrix.

```r
# Analyze with DESeq2 (default)
dba_obj <- dba.analyze(dba_obj, method = DBA_DESEQ2)

# Or with edgeR
dba_obj <- dba.analyze(dba_obj, method = DBA_EDGER)
```

## View Results

**Goal:** Retrieve and inspect differentially bound regions with fold changes and significance values.

**Approach:** Extract results as a GRanges object with dba.report, sorted by significance.

```r
# Summary of differential peaks
dba.show(dba_obj, bContrasts = TRUE)

# Retrieve differential binding results
db_results <- dba.report(dba_obj)
db_results
```

## Filter Results

**Goal:** Subset differential peaks by significance and fold-change thresholds.

**Approach:** Apply FDR and fold-change cutoffs to dba.report output.

```r
# Get significant peaks (FDR < 0.05, |FC| > 2)
db_sig <- dba.report(dba_obj, th = 0.05, fold = 2)

# Get all results for custom filtering
db_all <- dba.report(dba_obj, th = 1)
```

## Export Results

```r
# To data frame
results_df <- as.data.frame(dba.report(dba_obj, th = 1))

# Export to CSV
write.csv(results_df, 'differential_binding.csv', row.names = FALSE)

# Export to BED
library(rtracklayer)
export(db_sig, 'diff_peaks.bed', format = 'BED')
```

## Visualization

```r
# PCA plot
dba.plotPCA(dba_obj, DBA_CONDITION, label = DBA_ID)

# Correlation heatmap
dba.plotHeatmap(dba_obj)

# MA plot
dba.plotMA(dba_obj)

# Volcano plot
dba.plotVolcano(dba_obj)

# Heatmap of differential peaks
dba.plotHeatmap(dba_obj, contrast = 1, correlations = FALSE)
```

## Venn Diagram of Peaks

```r
# Overlap between conditions
dba.plotVenn(dba_obj, dba_obj$masks$control)
dba.plotVenn(dba_obj, dba_obj$masks$treatment)
```

## Profile Plots

```r
# Average signal profile
profiles <- dba.plotProfile(dba_obj)
```

## Get Consensus Peaks

```r
# Export consensus peakset
consensus <- dba.peakset(dba_obj, bRetrieve = TRUE)
export(consensus, 'consensus_peaks.bed', format = 'BED')
```

## Multi-Factor Design

```r
# With blocking factor (e.g., batch correction)
dba_obj <- dba.contrast(dba_obj, design = '~ Batch + Condition')
dba_obj <- dba.analyze(dba_obj)
```

## DiffBind 3.0 Notes

DiffBind 3.0+ introduced significant changes:
- `dba.normalize()` is now required before analysis
- Default `summits=250` recenters peaks (was FALSE in old

Del mismo repositorio

aav-vector-design-agentSkill

adaptyvSkill

Cloud laboratory platform for automated protein testing and validation. Use when designing proteins and needing experimental validation including binding assays, expression testing, thermostability measurements, enzyme activity assays, or protein sequence optimization. Also use for submitting experiments via API, tracking experiment status, downloading results, optimizing protein sequences for better expression using computational tools (NetSolP, SoluProt, SolubleMPNN, ESM), or managing protein design workflows with wet-lab validation.

adhd-daily-plannerSkill

Time-blind friendly planning, executive function support, and daily structure for ADHD brains. Specializes in realistic time estimation, dopamine-aware task design, and building systems that

aeonSkill

This skill should be used for time series machine learning tasks including classification, regression, clustering, forecasting, anomaly detection, segmentation, and similarity search. Use when working with temporal data, sequential patterns, or time-indexed observations requiring specialized algorithms beyond standard ML approaches. Particularly suited for univariate and multivariate time series analysis with scikit-learn compatible APIs.

agent-browserSkill

Browse the web for any task — research topics, read articles, interact with web apps, fill forms, take screenshots, extract data, and test web pages. Use whenever a browser would be useful, not just when the user explicitly asks.

agentd-drug-discoverySkill

ai-analyzerSkill

AI驱动的综合健康分析系统，整合多维度健康数据、识别异常模式、预测健康风险、提供个性化建议。支持智能问答和AI健康报告生成。

alphafold-databaseSkill

Access AlphaFold's 200M+ AI-predicted protein structures. Retrieve structures by UniProt ID, download PDB/mmCIF files, analyze confidence metrics (pLDDT, PAE), for drug discovery and structural biology.