Skill286 estrellas del repoactualizado 5d ago

bedtools-genomic-intervals

bedtools is a command-line toolkit for genome arithmetic operations on coordinate-sorted genomic intervals in BED, BAM, GFF, and VCF formats. Use it to find overlaps between feature sets, merge adjacent regions, compute read coverage depth, extract FASTA sequences, and annotate features with nearest neighbors. It is essential for ChIP-seq peak annotation, ATAC-seq peak merging, region filtering, and variant annotation workflows.

Ver fuente Repositorio: SciAgent-Skills

Instalar en Claude Code

Copiar

git clone --depth 1 https://github.com/jaechang-hits/SciAgent-Skills /tmp/bedtools-genomic-intervals && cp -r /tmp/bedtools-genomic-intervals/skills/genomics-bioinformatics/interval-ops/bedtools-genomic-intervals ~/.claude/skills/bedtools-genomic-intervals

Después abre una sesión nueva de Claude Code; el skill carga automáticamente.

Definición

SKILL.md

# bedtools — Genomic Interval Analysis Toolkit

## Overview

bedtools is the standard toolkit for operating on genomic intervals in BED, BAM, GFF, and VCF formats. It solves the core problem of genome arithmetic: finding overlaps between feature sets, computing coverage, extracting sequences, merging adjacent regions, and annotating features with nearest neighbors. bedtools operates on sorted coordinate lists and runs at C speed, making it practical for whole-genome analyses.

## When to Use

- Intersecting ChIP-seq peaks with gene annotations to find promoter-overlapping peaks
- Merging overlapping ATAC-seq peaks or called regions across replicates
- Computing read coverage depth over target capture regions
- Extracting FASTA sequences for motif discovery or primer design
- Finding the nearest gene to each regulatory element or variant
- Subtracting blacklist or repeat regions from peak calls
- Expanding genomic intervals by fixed distance (promoter regions)
- Use `tabix` instead for fast indexed queries of a single genomic region
- For normalized coverage bigWig tracks, use `deeptools bamCoverage` instead
- Use `mosdepth` instead for whole-genome per-base depth (10× faster)

## Prerequisites

- **Python packages**: None required (command-line only)
- **Input requirements**: BED/BAM/GFF/VCF files; FASTA reference for `getfasta`; genome file (chromosome sizes) for `slop`/`flank`/`genomecov`
- **Sorting**: Most operations require coordinate-sorted input

> **Check before installing**: The tool may already be available in the current environment (e.g., inside a `pixi` / `conda` env). Run `command -v bedtools` first and skip the install commands below if it returns a path. When running inside a pixi project, invoke the tool via `pixi run bedtools` rather than bare `bedtools`.

```bash
# Bioconda (recommended)
conda install -c bioconda bedtools

# Homebrew (macOS)
brew install bedtools

# Verify
bedtools --version
# bedtools v2.31.0

# Create genome file from FASTA index
samtools faidx reference.fa
cut -f1,2 reference.fa.fai > genome.txt  # chr → size table
```

## Quick Start

```bash
# Find peaks overlapping genes, then merge overlapping peaks
bedtools intersect -a peaks.bed -b genes.bed -wa -wb > peaks_with_genes.bed
bedtools merge -i peaks.bed > merged_peaks.bed
bedtools coverage -a genes.bed -b reads.bam > gene_coverage.bed
```

## Core API

### Module 1: Interval Intersection and Overlap Analysis

Find regions that overlap between two feature sets.

```bash
# Basic intersection: output overlapping regions
bedtools intersect -a peaks.bed -b genes.bed

# Report original A and B features for each overlap
bedtools intersect -a peaks.bed -b genes.bed -wa -wb

# Count B overlaps per A feature (adds column)
bedtools intersect -a peaks.bed -b genes.bed -c
# Output: chr1  1000  2000  peak1  gene_count

# Peaks with ANY overlap (report each peak once)
bedtools intersect -a peaks.bed -b genes.bed -u

# Peaks with NO overlap in B (invert filter)
bedtools intersect -a peaks.bed -b blacklist.bed -v
```

```bash
# Require reciprocal 50% overlap both ways
bedtools intersect -a exp1.bed -b exp2.bed -f 0.5 -F 0.5 -r

# Same-strand intersections only
bedtools intersect -a peaks.bed -b genes.bed -s

# Multiple database files with overlap counts per file
bedtools intersect -a query.bed -b enhancers.bed promoters.bed \
    -names enh prom -C

# Memory-efficient mode for pre-sorted large files
bedtools intersect -a sorted_peaks.bed -b sorted_genes.bed -sorted
```

### Module 2: Interval Merging and Arithmetic

Combine overlapping intervals and perform set operations.

```bash
# Merge overlapping and adjacent intervals
sort -k1,1 -k2,2n peaks.bed | bedtools merge -i stdin

# Merge intervals within 500 bp of each other
bedtools merge -i peaks.bed -d 500

# Merge and count original features
bedtools merge -i peaks.bed -c 1 -o count
# Output: chr1  1000  5000  3 (3 original peaks merged)

# Merge and collapse feature names
bedtools merge -i peaks.bed -c 4 -o collapse -delim ";"
# Output: chr1  1000  5000  peak1;peak2;peak3
```

```bash
# Subtract B from A (remove covered bases)
bedtools subtract -a peaks.bed -b blacklist.bed

# Remove entire A feature if ANY B overlap
bedtools subtract -a peaks.bed -b exclusion.bed -A

# Find genomic gaps (complement of covered regions)
bedtools complement -i merged.bed -g genome.txt
```

### Module 3: Coverage Analysis

Calculate depth and breadth of read coverage over features.

```bash
# Coverage stats per feature (count, bases covered, % covered)
bedtools coverage -a target_genes.bed -b aligned.bam
# Output: chr  start  end  gene  n_overlapping_reads  bases_covered  feature_len  fraction_covered

# Per-base depth within each feature
bedtools coverage -a targets.bed -b aligned.bam -d
# Output: chr  start  end  name  position  depth

# Coverage histogram per feature
bedtools coverage -a features.bed -b aligned.bam -hist
```

```bash
# Genome-wide BEDGRAPH (coverage per bin)
bedtools genomecov -ibam aligned.bam -bg -o coverage.bedgraph

# Include zero-coverage regions (for whole-genome coverage)
bedtools genomecov -ibam aligned.bam -bga > full_coverage.bedgraph

# Per-base depth for whole genome
bedtools genomecov -ibam aligned.bam -d > depth.txt

# Scaled BEDGRAPH (RPM normalization: total=50M reads → scale=1/50)
bedtools genomecov -ibam aligned.bam -bg -scale 0.00000002 > rpm.bedgraph

# Strand-specific coverage tracks
bedtools genomecov -ibam rnaseq.bam -bg -strand + > forward.bedgraph
bedtools genomecov -ibam rnaseq.bam -bg -strand - > reverse.bedgraph
```

### Module 4: Sequence Extraction and Nearest Feature

Extract genomic sequences and annotate features with neighbors.

```bash
# Extract FASTA sequences for each BED region
bedtools getfasta -fi genome.fa -bed regions.bed -fo sequences.fasta

# Strand-aware extraction (reverse complement - strand)
bedtools getfasta -fi genome.fa -bed regions.bed -s -fo stranded.fasta

# Custom FASTA headers (name + coords)
bedtools get

Del mismo repositorio

sciagent-skill-creatorSkill

opentrons-integrationSkill

Opentrons Protocol API v2 for OT-2/Flex: Python protocols for pipetting, serial dilutions, PCR, plate replication; control thermocycler, heater-shaker, magnetic, temperature modules. Use pylabrobot for multi-vendor.

plotly-interactive-visualizationSkill

Interactive visualization with Plotly. 40+ chart types (scatter, line, heatmap, 3D, geographic) with hover, zoom, pan. Two APIs: Plotly Express (DataFrame) and Graph Objects (fine control). For static publication figures use matplotlib; for statistical grammar use seaborn.

seaborn-statistical-visualizationSkill

Statistical visualization on matplotlib + pandas. Distributions (histplot, kdeplot, violin, box), relational (scatter, line), categorical, regression, correlation heatmaps. Auto aggregation/CIs. Use plotly for interactive; matplotlib for low-level.

single-cell-annotationSkill

Best practices for single-cell RNA-seq cell type annotation including marker-based, reference-based, and automated classification approaches.

pymc-bayesian-modelingSkill

Bayesian modeling with PyMC 5: priors, likelihood, NUTS/ADVI sampling, diagnostics (R-hat, ESS), LOO/WAIC comparison, prediction. Hierarchical, logistic, GP variants; predictive checks.

scikit-survival-analysisSkill

Time-to-event modeling with scikit-survival: Cox PH (elastic net), Random Survival Forests, Boosting, SVMs for censored data. C-index, Brier, time-dependent AUC; Kaplan-Meier, Nelson-Aalen, competing risks. Pipeline/GridSearchCV compatible. Use statsmodels for frequentist, pymc for Bayesian, lifelines for parametric.

statistical-analysisSkill