bio-alignment-files-bam-statistics
This skill generates alignment quality statistics from BAM files using samtools commands including flagstat, idxstats, stats, depth, and coverage. Use it when evaluating read alignment quality, calculating sequence coverage across genomic regions, identifying mapping issues, or producing quality control reports for next-generation sequencing datasets.
git clone --depth 1 https://github.com/FreedomIntelligence/OpenClaw-Medical-Skills /tmp/bio-alignment-files-bam-statistics && cp -r /tmp/bio-alignment-files-bam-statistics/skills/bio-alignment-files-bam-statistics ~/.claude/skills/bio-alignment-files-bam-statisticsSKILL.md
<!--
# COPYRIGHT NOTICE
# This file is part of the "Universal Biomedical Skills" project.
# Copyright (c) 2026 MD BABU MIA, PhD <md.babu.mia@mssm.edu>
# All Rights Reserved.
#
# This code is proprietary and confidential.
# Unauthorized copying of this file, via any medium is strictly prohibited.
#
# Provenance: Authenticated by MD BABU MIA
-->
---
name: bio-alignment-files-bam-statistics
description: Generate alignment statistics using samtools flagstat, stats, depth, and coverage. Use when assessing alignment quality, calculating coverage, or generating QC reports.
tool_type: cli
primary_tool: samtools
measurable_outcome: Execute skill workflow successfully with valid output within 15 minutes.
allowed-tools:
- read_file
- run_shell_command
---
# BAM Statistics
Generate alignment statistics using samtools and pysam.
## Quick Summary Commands
| Command | Output | Speed |
|---------|--------|-------|
| `flagstat` | Read counts by category | Very fast |
| `idxstats` | Per-chromosome counts | Very fast (needs index) |
| `stats` | Comprehensive statistics | Moderate |
| `depth` | Per-position depth | Slow (full scan) |
| `coverage` | Per-region coverage | Fast (needs index) |
## samtools flagstat
Fast summary of alignment flags.
```bash
samtools flagstat input.bam
```
Output:
```
10000000 + 0 in total (QC-passed reads + QC-failed reads)
0 + 0 secondary
50000 + 0 supplementary
0 + 0 duplicates
9800000 + 0 mapped (98.00% : N/A)
9950000 + 0 paired in sequencing
4975000 + 0 read1
4975000 + 0 read2
9700000 + 0 properly paired (97.49% : N/A)
9750000 + 0 with itself and mate mapped
100000 + 0 singletons (1.01% : N/A)
25000 + 0 with mate mapped to a different chr
10000 + 0 with mate mapped to a different chr (mapQ>=5)
```
### Multi-threaded
```bash
samtools flagstat -@ 4 input.bam
```
### Output to File
```bash
samtools flagstat input.bam > flagstat.txt
```
## samtools idxstats
Per-chromosome read counts (requires index).
```bash
samtools idxstats input.bam
```
Output format: `chrom length mapped unmapped`
```
chr1 248956422 5000000 1000
chr2 242193529 4800000 800
chrM 16569 50000 100
* 0 0 150000
```
### Parse idxstats
```bash
# Total mapped reads
samtools idxstats input.bam | awk '{sum += $3} END {print sum}'
# Mitochondrial percentage
samtools idxstats input.bam | awk '
/^chrM/ {mt = $3}
{total += $3}
END {print mt/total*100 "% mitochondrial"}'
```
## samtools stats
Comprehensive statistics including insert size, base quality, and more.
```bash
samtools stats input.bam > stats.txt
```
### View Summary Numbers
```bash
samtools stats input.bam | grep "^SN"
```
Key summary fields:
- `raw total sequences` - Total reads
- `reads mapped` - Mapped reads
- `reads mapped and paired` - Properly paired
- `insert size average` - Mean insert size
- `insert size standard deviation` - Insert size spread
- `average length` - Mean read length
- `error rate` - Mismatch rate
### Generate Plots (with plot-bamstats)
```bash
samtools stats input.bam > stats.txt
plot-bamstats -p plots/ stats.txt
```
### Stats for Specific Region
```bash
samtools stats input.bam chr1:1000000-2000000 > region_stats.txt
```
## samtools depth
Per-position read depth.
### Basic Depth
```bash
samtools depth input.bam > depth.txt
```
Output: `chrom position depth`
### Depth at Specific Positions
```bash
samtools depth -r chr1:1000-2000 input.bam
```
### Include Zero-Depth Positions
```bash
samtools depth -a input.bam > depth_with_zeros.txt
```
### Maximum Depth Cap
```bash
samtools depth -d 0 input.bam # No cap (default 8000)
```
### Depth from BED Regions
```bash
samtools depth -b regions.bed input.bam
```
### Calculate Mean Depth
```bash
samtools depth input.bam | awk '{sum += $3; n++} END {print sum/n}'
```
## samtools coverage
Per-chromosome or per-region coverage statistics (faster than depth).
```bash
samtools coverage input.bam
```
Output columns:
- `#rname` - Reference name
- `startpos` - Start position
- `endpos` - End position
- `numreads` - Number of reads
- `covbases` - Bases with coverage
- `coverage` - Percentage of bases covered
- `meandepth` - Mean depth
- `meanbaseq` - Mean base quality
- `meanmapq` - Mean mapping quality
### Coverage for Specific Region
```bash
samtools coverage -r chr1:1000000-2000000 input.bam
```
### Coverage from BED
```bash
samtools coverage -b regions.bed input.bam
```
### Histogram Output
```bash
samtools coverage -m input.bam
```
## pysam Python Alternative
### Count Reads
```python
import pysam
with pysam.AlignmentFile('input.bam', 'rb') as bam:
total = mapped = paired = proper = 0
for read in bam:
total += 1
if not read.is_unmapped:
mapped += 1
if read.is_paired:
paired += 1
if read.is_proper_pair:
proper += 1
print(f'Total: {total}')
print(f'Mapped: {mapped} ({mapped/total*100:.1f}%)')
print(f'Properly paired: {proper} ({proper/paired*100:.1f}%)')
```
### Per-Chromosome Counts
```python
import pysam
with pysam.AlignmentFile('input.bam', 'rb') as bam:
for stat in bam.get_index_statistics():
print(f'{stat.contig}: {stat.mapped} mapped, {stat.unmapped} unmapped')
```
### Calculate Depth at Position
```python
import pysam
with pysam.AlignmentFile('input.bam', 'rb') as bam:
for pileup in bam.pileup('chr1', 1000000, 1000001):
print(f'Position {pileup.pos}: depth {pileup.n}')
```
### Mean Depth in Region
```python
import pysam
def mean_depth(bam_path, chrom, start, end):
depths = []
with pysam.AlignmentFile(bam_path, 'rb') as bam:
for pileup in bam.pileup(chrom, start, end, truncate=True):
depths.append(pileup.n)
if depths:
return sum(depths) / len(depths)
return 0
depth = mean_depth('input.bam', 'chr1', 1000000, 2000000)
print(f'Mean depth: {depth:.1f}x')
```
### Coverage Statistics
```python
import pysam
def coverage_statCloud laboratory platform for automated protein testing and validation. Use when designing proteins and needing experimental validation including binding assays, expression testing, thermostability measurements, enzyme activity assays, or protein sequence optimization. Also use for submitting experiments via API, tracking experiment status, downloading results, optimizing protein sequences for better expression using computational tools (NetSolP, SoluProt, SolubleMPNN, ESM), or managing protein design workflows with wet-lab validation.
Time-blind friendly planning, executive function support, and daily structure for ADHD brains. Specializes in realistic time estimation, dopamine-aware task design, and building systems that
This skill should be used for time series machine learning tasks including classification, regression, clustering, forecasting, anomaly detection, segmentation, and similarity search. Use when working with temporal data, sequential patterns, or time-indexed observations requiring specialized algorithms beyond standard ML approaches. Particularly suited for univariate and multivariate time series analysis with scikit-learn compatible APIs.
Browse the web for any task — research topics, read articles, interact with web apps, fill forms, take screenshots, extract data, and test web pages. Use whenever a browser would be useful, not just when the user explicitly asks.
AI驱动的综合健康分析系统,整合多维度健康数据、识别异常模式、预测健康风险、提供个性化建议。支持智能问答和AI健康报告生成。
Access AlphaFold's 200M+ AI-predicted protein structures. Retrieve structures by UniProt ID, download PDB/mmCIF files, analyze confidence metrics (pLDDT, PAE), for drug discovery and structural biology.