bio-alignment-filtering
This Claude Code skill filters DNA sequence alignments from BAM files using samtools and pysam based on quality metrics, read flags, and genomic regions. Use it when subsetting aligned sequencing data, removing low-quality or duplicate reads, extracting specific read pairs, or isolating alignments to target genomic intervals for downstream analysis.
git clone --depth 1 https://github.com/FreedomIntelligence/OpenClaw-Medical-Skills /tmp/bio-alignment-filtering && cp -r /tmp/bio-alignment-filtering/skills/bio-alignment-filtering ~/.claude/skills/bio-alignment-filteringSKILL.md
<!--
# COPYRIGHT NOTICE
# This file is part of the "Universal Biomedical Skills" project.
# Copyright (c) 2026 MD BABU MIA, PhD <md.babu.mia@mssm.edu>
# All Rights Reserved.
#
# This code is proprietary and confidential.
# Unauthorized copying of this file, via any medium is strictly prohibited.
#
# Provenance: Authenticated by MD BABU MIA
-->
---
name: bio-alignment-filtering
description: Filter alignments by flags, mapping quality, and regions using samtools view and pysam. Use when extracting specific reads, removing low-quality alignments, or subsetting to target regions.
tool_type: cli
primary_tool: samtools
measurable_outcome: Execute skill workflow successfully with valid output within 15 minutes.
allowed-tools:
- read_file
- run_shell_command
---
# Alignment Filtering
Filter alignments by flags, quality, and regions using samtools and pysam.
## Filter Flags
| Option | Description |
|--------|-------------|
| `-f FLAG` | Include reads with ALL bits set |
| `-F FLAG` | Exclude reads with ANY bits set |
| `-G FLAG` | Exclude reads with ALL bits set |
| `-q MAPQ` | Minimum mapping quality |
| `-L BED` | Include reads overlapping regions |
## Common FLAG Values
| Flag | Hex | Meaning |
|------|-----|---------|
| 1 | 0x1 | Paired |
| 2 | 0x2 | Proper pair |
| 4 | 0x4 | Unmapped |
| 8 | 0x8 | Mate unmapped |
| 16 | 0x10 | Reverse strand |
| 32 | 0x20 | Mate reverse strand |
| 64 | 0x40 | First in pair (read1) |
| 128 | 0x80 | Second in pair (read2) |
| 256 | 0x100 | Secondary alignment |
| 512 | 0x200 | Failed QC |
| 1024 | 0x400 | Duplicate |
| 2048 | 0x800 | Supplementary |
## Filter by FLAG
### Keep Only Mapped Reads
```bash
samtools view -F 4 -o mapped.bam input.bam
```
### Keep Only Unmapped Reads
```bash
samtools view -f 4 -o unmapped.bam input.bam
```
### Keep Only Properly Paired
```bash
samtools view -f 2 -o proper.bam input.bam
```
### Remove Duplicates
```bash
samtools view -F 1024 -o nodup.bam input.bam
```
### Remove Secondary and Supplementary
```bash
samtools view -F 2304 -o primary.bam input.bam
```
### Keep Only Primary Alignments
```bash
samtools view -F 256 -F 2048 -o primary.bam input.bam
# Or combined: -F 2304
```
### Keep Read1 Only
```bash
samtools view -f 64 -o read1.bam input.bam
```
### Keep Read2 Only
```bash
samtools view -f 128 -o read2.bam input.bam
```
### Forward Strand Only
```bash
samtools view -F 16 -o forward.bam input.bam
```
### Reverse Strand Only
```bash
samtools view -f 16 -o reverse.bam input.bam
```
## Filter by Mapping Quality
### Minimum MAPQ
```bash
samtools view -q 30 -o highqual.bam input.bam
```
### MAPQ and Mapped
```bash
samtools view -F 4 -q 30 -o filtered.bam input.bam
```
### Common MAPQ Thresholds
| MAPQ | Meaning |
|------|---------|
| 0 | Mapped to multiple locations equally well |
| 20 | ~1% chance of wrong mapping |
| 30 | ~0.1% chance of wrong mapping |
| 40 | ~0.01% chance of wrong mapping |
| 60 | Unique mapping (BWA max) |
## Filter by Region
### Single Region
```bash
samtools view -o region.bam input.bam chr1:1000000-2000000
```
### Multiple Regions
```bash
samtools view -o regions.bam input.bam chr1:1000-2000 chr2:3000-4000
```
### Regions from BED File
```bash
samtools view -L targets.bed -o targets.bam input.bam
```
### Combine Region and Quality
```bash
samtools view -q 30 -L targets.bed -o filtered.bam input.bam
```
## Combined Filters
### Standard Quality Filter
```bash
# Primary, mapped, non-duplicate, MAPQ >= 30
samtools view -F 3332 -q 30 -o filtered.bam input.bam
# 3332 = 4 (unmapped) + 256 (secondary) + 1024 (duplicate) + 2048 (supplementary)
```
### Variant Calling Prep
```bash
# Properly paired, primary, no duplicates, MAPQ >= 20
samtools view -f 2 -F 3328 -q 20 -o clean.bam input.bam
# 3328 = 256 (secondary) + 1024 (duplicate) + 2048 (supplementary)
# Note: -f 2 (proper pair) implies mapped, so -F 4 is not strictly needed
```
### ChIP-seq Filter
```bash
# Remove duplicates and low MAPQ
samtools view -F 1024 -q 30 -o filtered.bam input.bam
```
## Subsample Reads
### Random Subsample
```bash
# Keep ~10% of reads
samtools view -s 0.1 -o subset.bam input.bam
# With seed for reproducibility
samtools view -s 42.1 -o subset.bam input.bam
```
### Subsample to Target Count
```bash
# Calculate fraction needed
total=$(samtools view -c input.bam)
frac=$(echo "scale=4; 1000000 / $total" | bc)
samtools view -s "$frac" -o subset.bam input.bam
```
## pysam Python Alternative
### Basic Filtering
```python
import pysam
with pysam.AlignmentFile('input.bam', 'rb') as infile:
with pysam.AlignmentFile('filtered.bam', 'wb', header=infile.header) as outfile:
for read in infile:
if read.is_unmapped:
continue
if read.mapping_quality < 30:
continue
if read.is_duplicate:
continue
outfile.write(read)
```
### Filter with Function
```python
import pysam
def passes_filter(read):
if read.is_unmapped:
return False
if read.is_secondary or read.is_supplementary:
return False
if read.is_duplicate:
return False
if read.mapping_quality < 30:
return False
return True
with pysam.AlignmentFile('input.bam', 'rb') as infile:
with pysam.AlignmentFile('filtered.bam', 'wb', header=infile.header) as outfile:
for read in infile:
if passes_filter(read):
outfile.write(read)
```
### Filter by Region
```python
import pysam
with pysam.AlignmentFile('input.bam', 'rb') as infile:
with pysam.AlignmentFile('region.bam', 'wb', header=infile.header) as outfile:
for read in infile.fetch('chr1', 1000000, 2000000):
outfile.write(read)
```
### Filter from BED File
```python
import pysam
def read_bed(bed_path):
regions = []
with open(bed_path) as f:
for line in f:
if line.startswith('#'):
continue
parts = line.strip().split('\t')
regionsCloud laboratory platform for automated protein testing and validation. Use when designing proteins and needing experimental validation including binding assays, expression testing, thermostability measurements, enzyme activity assays, or protein sequence optimization. Also use for submitting experiments via API, tracking experiment status, downloading results, optimizing protein sequences for better expression using computational tools (NetSolP, SoluProt, SolubleMPNN, ESM), or managing protein design workflows with wet-lab validation.
Time-blind friendly planning, executive function support, and daily structure for ADHD brains. Specializes in realistic time estimation, dopamine-aware task design, and building systems that
This skill should be used for time series machine learning tasks including classification, regression, clustering, forecasting, anomaly detection, segmentation, and similarity search. Use when working with temporal data, sequential patterns, or time-indexed observations requiring specialized algorithms beyond standard ML approaches. Particularly suited for univariate and multivariate time series analysis with scikit-learn compatible APIs.
Browse the web for any task — research topics, read articles, interact with web apps, fill forms, take screenshots, extract data, and test web pages. Use whenever a browser would be useful, not just when the user explicitly asks.
AI驱动的综合健康分析系统,整合多维度健康数据、识别异常模式、预测健康风险、提供个性化建议。支持智能问答和AI健康报告生成。
Access AlphaFold's 200M+ AI-predicted protein structures. Retrieve structures by UniProt ID, download PDB/mmCIF files, analyze confidence metrics (pLDDT, PAE), for drug discovery and structural biology.