Skip to main content
ClaudeWave
Skill199 estrellas del repoactualizado 16d ago

bedtools-genomic-intervals

Genomic interval ops on BED/BAM/GFF/VCF. Find overlaps, merge intervals, compute coverage, extract FASTA, find nearest features. Core for ChIP-seq peak annotation, region filtering, genome arithmetic. Use tabix for indexed single-region queries; use deeptools for normalized bigWig coverage.

Instalar en Claude Code
Copiar
git clone --depth 1 https://github.com/jaechang-hits/SciAgent-Skills /tmp/bedtools-genomic-intervals && cp -r /tmp/bedtools-genomic-intervals/skills/genomics-bioinformatics/interval-ops/bedtools-genomic-intervals ~/.claude/skills/bedtools-genomic-intervals
Después abre una sesión nueva de Claude Code; el skill carga automáticamente.

SKILL.md

# bedtools — Genomic Interval Analysis Toolkit

## Overview

bedtools is the standard toolkit for operating on genomic intervals in BED, BAM, GFF, and VCF formats. It solves the core problem of genome arithmetic: finding overlaps between feature sets, computing coverage, extracting sequences, merging adjacent regions, and annotating features with nearest neighbors. bedtools operates on sorted coordinate lists and runs at C speed, making it practical for whole-genome analyses.

## When to Use

- Intersecting ChIP-seq peaks with gene annotations to find promoter-overlapping peaks
- Merging overlapping ATAC-seq peaks or called regions across replicates
- Computing read coverage depth over target capture regions
- Extracting FASTA sequences for motif discovery or primer design
- Finding the nearest gene to each regulatory element or variant
- Subtracting blacklist or repeat regions from peak calls
- Expanding genomic intervals by fixed distance (promoter regions)
- Use `tabix` instead for fast indexed queries of a single genomic region
- For normalized coverage bigWig tracks, use `deeptools bamCoverage` instead
- Use `mosdepth` instead for whole-genome per-base depth (10× faster)

## Prerequisites

- **Python packages**: None required (command-line only)
- **Input requirements**: BED/BAM/GFF/VCF files; FASTA reference for `getfasta`; genome file (chromosome sizes) for `slop`/`flank`/`genomecov`
- **Sorting**: Most operations require coordinate-sorted input

> **Check before installing**: The tool may already be available in the current environment (e.g., inside a `pixi` / `conda` env). Run `command -v bedtools` first and skip the install commands below if it returns a path. When running inside a pixi project, invoke the tool via `pixi run bedtools` rather than bare `bedtools`.

```bash
# Bioconda (recommended)
conda install -c bioconda bedtools

# Homebrew (macOS)
brew install bedtools

# Verify
bedtools --version
# bedtools v2.31.0

# Create genome file from FASTA index
samtools faidx reference.fa
cut -f1,2 reference.fa.fai > genome.txt  # chr → size table
```

## Quick Start

```bash
# Find peaks overlapping genes, then merge overlapping peaks
bedtools intersect -a peaks.bed -b genes.bed -wa -wb > peaks_with_genes.bed
bedtools merge -i peaks.bed > merged_peaks.bed
bedtools coverage -a genes.bed -b reads.bam > gene_coverage.bed
```

## Core API

### Module 1: Interval Intersection and Overlap Analysis

Find regions that overlap between two feature sets.

```bash
# Basic intersection: output overlapping regions
bedtools intersect -a peaks.bed -b genes.bed

# Report original A and B features for each overlap
bedtools intersect -a peaks.bed -b genes.bed -wa -wb

# Count B overlaps per A feature (adds column)
bedtools intersect -a peaks.bed -b genes.bed -c
# Output: chr1  1000  2000  peak1  gene_count

# Peaks with ANY overlap (report each peak once)
bedtools intersect -a peaks.bed -b genes.bed -u

# Peaks with NO overlap in B (invert filter)
bedtools intersect -a peaks.bed -b blacklist.bed -v
```

```bash
# Require reciprocal 50% overlap both ways
bedtools intersect -a exp1.bed -b exp2.bed -f 0.5 -F 0.5 -r

# Same-strand intersections only
bedtools intersect -a peaks.bed -b genes.bed -s

# Multiple database files with overlap counts per file
bedtools intersect -a query.bed -b enhancers.bed promoters.bed \
    -names enh prom -C

# Memory-efficient mode for pre-sorted large files
bedtools intersect -a sorted_peaks.bed -b sorted_genes.bed -sorted
```

### Module 2: Interval Merging and Arithmetic

Combine overlapping intervals and perform set operations.

```bash
# Merge overlapping and adjacent intervals
sort -k1,1 -k2,2n peaks.bed | bedtools merge -i stdin

# Merge intervals within 500 bp of each other
bedtools merge -i peaks.bed -d 500

# Merge and count original features
bedtools merge -i peaks.bed -c 1 -o count
# Output: chr1  1000  5000  3 (3 original peaks merged)

# Merge and collapse feature names
bedtools merge -i peaks.bed -c 4 -o collapse -delim ";"
# Output: chr1  1000  5000  peak1;peak2;peak3
```

```bash
# Subtract B from A (remove covered bases)
bedtools subtract -a peaks.bed -b blacklist.bed

# Remove entire A feature if ANY B overlap
bedtools subtract -a peaks.bed -b exclusion.bed -A

# Find genomic gaps (complement of covered regions)
bedtools complement -i merged.bed -g genome.txt
```

### Module 3: Coverage Analysis

Calculate depth and breadth of read coverage over features.

```bash
# Coverage stats per feature (count, bases covered, % covered)
bedtools coverage -a target_genes.bed -b aligned.bam
# Output: chr  start  end  gene  n_overlapping_reads  bases_covered  feature_len  fraction_covered

# Per-base depth within each feature
bedtools coverage -a targets.bed -b aligned.bam -d
# Output: chr  start  end  name  position  depth

# Coverage histogram per feature
bedtools coverage -a features.bed -b aligned.bam -hist
```

```bash
# Genome-wide BEDGRAPH (coverage per bin)
bedtools genomecov -ibam aligned.bam -bg -o coverage.bedgraph

# Include zero-coverage regions (for whole-genome coverage)
bedtools genomecov -ibam aligned.bam -bga > full_coverage.bedgraph

# Per-base depth for whole genome
bedtools genomecov -ibam aligned.bam -d > depth.txt

# Scaled BEDGRAPH (RPM normalization: total=50M reads → scale=1/50)
bedtools genomecov -ibam aligned.bam -bg -scale 0.00000002 > rpm.bedgraph

# Strand-specific coverage tracks
bedtools genomecov -ibam rnaseq.bam -bg -strand + > forward.bedgraph
bedtools genomecov -ibam rnaseq.bam -bg -strand - > reverse.bedgraph
```

### Module 4: Sequence Extraction and Nearest Feature

Extract genomic sequences and annotate features with neighbors.

```bash
# Extract FASTA sequences for each BED region
bedtools getfasta -fi genome.fa -bed regions.bed -fo sequences.fasta

# Strand-aware extraction (reverse complement - strand)
bedtools getfasta -fi genome.fa -bed regions.bed -s -fo stranded.fasta

# Custom FASTA headers (name + coords)
bedtools get
sciagent-skill-creatorSkill

|

opentrons-integrationSkill

Opentrons Protocol API v2 for OT-2/Flex: Python protocols for pipetting, serial dilutions, PCR, plate replication; control thermocycler, heater-shaker, magnetic, temperature modules. Use pylabrobot for multi-vendor.

plotly-interactive-visualizationSkill

Interactive visualization with Plotly. 40+ chart types (scatter, line, heatmap, 3D, geographic) with hover, zoom, pan. Two APIs: Plotly Express (DataFrame) and Graph Objects (fine control). For static publication figures use matplotlib; for statistical grammar use seaborn.

seaborn-statistical-visualizationSkill

Statistical visualization on matplotlib + pandas. Distributions (histplot, kdeplot, violin, box), relational (scatter, line), categorical, regression, correlation heatmaps. Auto aggregation/CIs. Use plotly for interactive; matplotlib for low-level.

single-cell-annotationSkill

Best practices for single-cell RNA-seq cell type annotation including marker-based, reference-based, and automated classification approaches.

pymc-bayesian-modelingSkill

Bayesian modeling with PyMC 5: priors, likelihood, NUTS/ADVI sampling, diagnostics (R-hat, ESS), LOO/WAIC comparison, prediction. Hierarchical, logistic, GP variants; predictive checks.

scikit-survival-analysisSkill

Time-to-event modeling with scikit-survival: Cox PH (elastic net), Random Survival Forests, Boosting, SVMs for censored data. C-index, Brier, time-dependent AUC; Kaplan-Meier, Nelson-Aalen, competing risks. Pipeline/GridSearchCV compatible. Use statsmodels for frequentist, pymc for Bayesian, lifelines for parametric.

statistical-analysisSkill

>-