Skip to main content
ClaudeWave
Skill2.7k estrellas del repoactualizado 2mo ago

bio-chipseq-motif-analysis

This skill performs de novo DNA motif discovery and known transcription factor motif enrichment analysis on genomic peak data using HOMER and MEME-ChIP tools. Use it when analyzing ChIP-seq, ATAC-seq, or other peak-based genomic datasets to identify enriched DNA-binding motifs, discover novel transcription factor binding sites, and test against known motif databases for biological interpretation of regulatory elements.

Instalar en Claude Code
Copiar
git clone --depth 1 https://github.com/FreedomIntelligence/OpenClaw-Medical-Skills /tmp/bio-chipseq-motif-analysis && cp -r /tmp/bio-chipseq-motif-analysis/skills/bio-chipseq-motif-analysis ~/.claude/skills/bio-chipseq-motif-analysis
Después abre una sesión nueva de Claude Code; el skill carga automáticamente.

SKILL.md

## Version Compatibility

Reference examples tested with: BioPython 1.83+, bedtools 2.31+, matplotlib 3.8+, pandas 2.2+

Before using code patterns, verify installed versions match. If versions differ:
- Python: `pip show <package>` then `help(module.function)` to check signatures
- CLI: `<tool> --version` then `<tool> --help` to confirm flags

If code throws ImportError, AttributeError, or TypeError, introspect the installed
package and adapt the example to match the actual API rather than retrying.

# Motif Analysis

**"Find enriched motifs in my ChIP-seq peaks"** → Discover de novo DNA-binding motifs and test for known TF motif enrichment in peak sequences.
- CLI: `findMotifsGenome.pl peaks.bed hg38 output/` (HOMER), `meme-chip -db JASPAR peaks.fa` (MEME)

Identify DNA sequence motifs enriched in ChIP-seq or ATAC-seq peaks to discover transcription factor binding sites.

## Tool Comparison

| Tool | Strengths | Use Case |
|------|-----------|----------|
| HOMER | Fast, comprehensive, built-in databases | General motif analysis |
| MEME-ChIP | Multiple algorithms, web interface | Publication-quality |
| MEME | De novo discovery only | Simple discovery |
| FIMO | Known motif scanning | Genome-wide scanning |

## HOMER

### Installation

```bash
conda install -c bioconda homer

# Configure genome (required once)
perl /path/to/homer/configureHomer.pl -install hg38
perl /path/to/homer/configureHomer.pl -install mm10
```

### De Novo Motif Discovery

**Goal:** Discover enriched DNA-binding motifs directly from ChIP-seq peak sequences.

**Approach:** Run findMotifsGenome.pl on a peak BED file with a specified fragment size, optionally providing background regions and target motif lengths.

```bash
# Basic motif finding
findMotifsGenome.pl peaks.bed hg38 output_dir/ -size 200

# With background regions
findMotifsGenome.pl peaks.bed hg38 output_dir/ -size 200 -bg background.bed

# Specify motif lengths to search
findMotifsGenome.pl peaks.bed hg38 output_dir/ -size 200 -len 8,10,12
```

### Key Options

| Option | Description |
|--------|-------------|
| `-size <#>` | Fragment size for analysis (default 200) |
| `-size given` | Use actual peak sizes |
| `-bg <file>` | Background regions (BED) |
| `-len <#,#,...>` | Motif lengths to search |
| `-mask` | Mask repeats |
| `-p <#>` | Number of CPUs |
| `-S <#>` | Number of motifs to find (default 25) |
| `-mis <#>` | Mismatches allowed (default 2) |
| `-noweight` | Don't adjust for GC content |

### Output Files

```
output_dir/
├── homerResults.html      # Main results page
├── knownResults.html      # Known motif enrichment
├── homerMotifs.all.motifs # All discovered motifs
├── knownResults.txt       # Known motif statistics
└── motif1.motif           # Individual motif files
```

### Known Motif Enrichment Only

```bash
# Skip de novo, only check known motifs
findMotifsGenome.pl peaks.bed hg38 output_dir/ -size 200 -nomotif
```

### Scan for Specific Motifs

```bash
# Find instances of motif in peaks
annotatePeaks.pl peaks.bed hg38 -m motif.motif > annotated.txt

# Scan genome for motif occurrences
scanMotifGenomeWide.pl motif.motif hg38 > motif_sites.bed
```

### Motif Comparison

```bash
# Compare discovered motifs to known database
compareMotifs.pl motifs.motif output_dir/ -known
```

### Create Custom Motif

```bash
# From consensus sequence
seq2profile.pl CACGTG 4 > MYC.motif

# From aligned sequences
cat aligned_seqs.txt | alignAndConvert.pl - > custom.motif
```

## MEME Suite

### Installation

```bash
conda install -c bioconda meme
```

### Extract Sequences from Peaks

```bash
# Get FASTA sequences under peaks
bedtools getfasta -fi genome.fa -bed peaks.bed -fo peaks.fa

# Center peaks and resize
bedtools slop -i peaks.bed -g genome.sizes -b 100 | \
    bedtools getfasta -fi genome.fa -bed - -fo peaks_centered.fa
```

### MEME (De Novo Discovery)

```bash
# Basic de novo discovery
meme peaks.fa -dna -oc meme_output -mod zoops -nmotifs 10 -minw 6 -maxw 20

# With Markov background
fasta-get-markov peaks.fa > background.model
meme peaks.fa -dna -oc meme_output -bfile background.model -mod zoops -nmotifs 10
```

### MEME Options

| Option | Description |
|--------|-------------|
| `-mod zoops` | Zero or one per sequence (default for ChIP) |
| `-mod oops` | Exactly one per sequence |
| `-mod anr` | Any number of repeats |
| `-nmotifs <#>` | Number of motifs to find |
| `-minw <#>` | Minimum motif width |
| `-maxw <#>` | Maximum motif width |
| `-revcomp` | Search both strands |
| `-bfile <file>` | Background model file |

### MEME-ChIP (Comprehensive Pipeline)

**Goal:** Run a comprehensive motif analysis pipeline combining de novo discovery, central enrichment testing, and database comparison.

**Approach:** Provide peak FASTA sequences and a motif database to MEME-ChIP, which runs MEME, DREME, CentriMo, TOMTOM, and FIMO in a single invocation.

```bash
# All-in-one ChIP-seq motif analysis
meme-chip -oc meme_chip_output -db motif_database.meme peaks.fa
```

MEME-ChIP runs:
1. MEME - De novo discovery (central enrichment)
2. DREME - Short motif discovery
3. CentriMo - Central enrichment analysis
4. TOMTOM - Compare to known motifs
5. FIMO - Find motif instances

### DREME (Short Motifs)

```bash
# Find short enriched motifs
dreme -oc dreme_output -p peaks.fa -n background.fa
```

### CentriMo (Central Enrichment)

```bash
# Test for central enrichment of known motifs
centrimo -oc centrimo_output peaks.fa motif_database.meme
```

### TOMTOM (Motif Comparison)

```bash
# Compare discovered motifs to database
tomtom -oc tomtom_output discovered.meme database.meme
```

### FIMO (Motif Scanning)

```bash
# Scan sequences for motif matches
fimo --oc fimo_output motif.meme sequences.fa

# Scan genome
fimo --oc fimo_output --max-stored-scores 1000000 motif.meme genome.fa
```

## Motif Databases

### HOMER Built-in

```bash
# List available motif sets
ls /path/to/homer/data/knownTFs/

# Vertebrate, known motifs (default)
findMotifsGenome.pl peaks.be
aav-vector-design-agentSkill
adaptyvSkill

Cloud laboratory platform for automated protein testing and validation. Use when designing proteins and needing experimental validation including binding assays, expression testing, thermostability measurements, enzyme activity assays, or protein sequence optimization. Also use for submitting experiments via API, tracking experiment status, downloading results, optimizing protein sequences for better expression using computational tools (NetSolP, SoluProt, SolubleMPNN, ESM), or managing protein design workflows with wet-lab validation.

adhd-daily-plannerSkill

Time-blind friendly planning, executive function support, and daily structure for ADHD brains. Specializes in realistic time estimation, dopamine-aware task design, and building systems that

aeonSkill

This skill should be used for time series machine learning tasks including classification, regression, clustering, forecasting, anomaly detection, segmentation, and similarity search. Use when working with temporal data, sequential patterns, or time-indexed observations requiring specialized algorithms beyond standard ML approaches. Particularly suited for univariate and multivariate time series analysis with scikit-learn compatible APIs.

agent-browserSkill

Browse the web for any task — research topics, read articles, interact with web apps, fill forms, take screenshots, extract data, and test web pages. Use whenever a browser would be useful, not just when the user explicitly asks.

agentd-drug-discoverySkill
ai-analyzerSkill

AI驱动的综合健康分析系统,整合多维度健康数据、识别异常模式、预测健康风险、提供个性化建议。支持智能问答和AI健康报告生成。

alphafold-databaseSkill

Access AlphaFold's 200M+ AI-predicted protein structures. Retrieve structures by UniProt ID, download PDB/mmCIF files, analyze confidence metrics (pLDDT, PAE), for drug discovery and structural biology.