bio-consensus-sequences
The bio-consensus-sequences skill generates sample-specific consensus FASTA sequences by applying VCF variants to a reference genome using bcftools consensus. Use this when reconstructing individual sample genomes, selecting specific haplotypes, handling heterozygous sites with IUPAC codes, or masking low-coverage regions. It supports multi-sample VCFs, haplotype selection, and missing data handling through BED file masking.
git clone --depth 1 https://github.com/FreedomIntelligence/OpenClaw-Medical-Skills /tmp/bio-consensus-sequences && cp -r /tmp/bio-consensus-sequences/skills/bio-consensus-sequences ~/.claude/skills/bio-consensus-sequencesSKILL.md
## Version Compatibility
Reference examples tested with: BioPython 1.83+, bcftools 1.19+, bedtools 2.31+, minimap2 2.26+, samtools 1.19+
Before using code patterns, verify installed versions match. If versions differ:
- Python: `pip show <package>` then `help(module.function)` to check signatures
- CLI: `<tool> --version` then `<tool> --help` to confirm flags
If code throws ImportError, AttributeError, or TypeError, introspect the installed
package and adapt the example to match the actual API rather than retrying.
# Consensus Sequences
**"Generate a consensus sequence from my VCF"** → Apply called variants to a reference FASTA, producing a sample-specific genome with optional haplotype selection and low-coverage masking.
- CLI: `bcftools consensus -f reference.fa input.vcf.gz`
- Python: `cyvcf2` + `Bio.SeqIO` for simple SNP-only cases
## Basic Usage
### Generate Consensus
```bash
bcftools consensus -f reference.fa input.vcf.gz > consensus.fa
```
### Specify Sample
```bash
bcftools consensus -f reference.fa -s sample1 input.vcf.gz > sample1.fa
```
### Output to File
```bash
bcftools consensus -f reference.fa -o consensus.fa input.vcf.gz
```
## Haplotype Selection
### First Haplotype Only
```bash
bcftools consensus -f reference.fa -H 1 input.vcf.gz > haplotype1.fa
```
### Second Haplotype Only
```bash
bcftools consensus -f reference.fa -H 2 input.vcf.gz > haplotype2.fa
```
### Haplotype Options
| Option | Description |
|--------|-------------|
| `-H 1` | First haplotype |
| `-H 2` | Second haplotype |
| `-H A` | Apply all ALT alleles |
| `-H R` | Apply REF alleles where heterozygous |
| `-I` | Apply IUPAC ambiguity codes (separate flag) |
## IUPAC Codes for Heterozygous Sites
```bash
bcftools consensus -f reference.fa -I input.vcf.gz > consensus_iupac.fa
```
Heterozygous sites encoded with IUPAC ambiguity codes:
- A/G → R
- C/T → Y
- A/C → M
- G/T → K
- A/T → W
- C/G → S
## Missing Data Handling
### Mark Missing as N
```bash
bcftools consensus -f reference.fa -M N input.vcf.gz > consensus.fa
```
### Mark Low Coverage as N
Using a mask BED file:
```bash
# Create mask from depth
samtools depth input.bam | awk '$3<10 {print $1"\t"$2-1"\t"$2}' > low_coverage.bed
# Apply mask
bcftools consensus -f reference.fa -m low_coverage.bed input.vcf.gz > consensus.fa
```
### Mask Options
| Option | Description |
|--------|-------------|
| `-m FILE` | Mask regions in BED file with N |
| `-M CHAR` | Character for masked regions (default N) |
## Region Selection
### Specific Region
```bash
bcftools consensus -f reference.fa -r chr1:1000-2000 input.vcf.gz > region.fa
```
### Multiple Regions
Use with BED file to extract multiple regions.
## Chain Files
### Generate Chain File
```bash
bcftools consensus -f reference.fa -c chain.txt input.vcf.gz > consensus.fa
```
Chain files map coordinates between reference and consensus:
- Useful for liftover of annotations
- Required when indels change sequence length
### Chain File Format
```
chain score ref_name ref_size ref_strand ref_start ref_end query_name query_size query_strand query_start query_end id
```
## Sample-Specific Consensus
### For Each Sample
```bash
for sample in $(bcftools query -l input.vcf.gz); do
bcftools consensus -f reference.fa -s "$sample" input.vcf.gz > "${sample}.fa"
done
```
### Both Haplotypes
```bash
sample="sample1"
bcftools consensus -f reference.fa -s "$sample" -H 1 input.vcf.gz > "${sample}_hap1.fa"
bcftools consensus -f reference.fa -s "$sample" -H 2 input.vcf.gz > "${sample}_hap2.fa"
```
## Filtering Before Consensus
### PASS Variants Only
```bash
bcftools view -f PASS input.vcf.gz | \
bcftools consensus -f reference.fa > consensus.fa
```
### High-Quality Variants Only
```bash
bcftools filter -i 'QUAL>=30 && INFO/DP>=10' input.vcf.gz | \
bcftools consensus -f reference.fa > consensus.fa
```
### SNPs Only
```bash
bcftools view -v snps input.vcf.gz | \
bcftools consensus -f reference.fa > consensus_snps.fa
```
## Sequence Naming
### Default Naming
Output uses reference sequence names.
### Custom Prefix
```bash
bcftools consensus -f reference.fa -p "sample1_" input.vcf.gz > consensus.fa
```
Sequences named: `sample1_chr1`, `sample1_chr2`, etc.
## Common Workflows
**Goal:** Generate consensus sequences for downstream analyses like phylogenetics, viral surveillance, or gene-level comparison.
**Approach:** Filter variants to high-quality calls, apply per-sample consensus generation, mask low-coverage regions with N, then combine for multi-sample workflows.
### Phylogenetic Analysis Preparation
```bash
# For each sample, generate consensus
mkdir -p consensus
for sample in $(bcftools query -l cohort.vcf.gz); do
bcftools view -s "$sample" cohort.vcf.gz | \
bcftools view -c 1 | \
bcftools consensus -f reference.fa > "consensus/${sample}.fa"
done
# Combine for alignment
cat consensus/*.fa > all_samples.fa
```
### Viral Genome Assembly
```bash
# Apply high-quality variants only
bcftools filter -i 'QUAL>=30 && INFO/DP>=20' variants.vcf.gz | \
bcftools view -f PASS | \
bcftools consensus -f reference.fa -M N > consensus.fa
```
### Gene-Specific Consensus
```bash
# Extract gene region
bcftools consensus -f reference.fa -r chr1:1000000-1010000 \
-s sample1 variants.vcf.gz > gene.fa
```
### Masked Low-Coverage Regions
```bash
# Create mask from coverage
samtools depth -a input.bam | \
awk '$3<5 {print $1"\t"$2-1"\t"$2}' | \
bedtools merge > low_coverage.bed
# Generate consensus with mask
bcftools consensus -f reference.fa -m low_coverage.bed \
variants.vcf.gz > consensus.fa
```
## Verify Consensus
### Check Differences
```bash
# Align consensus to reference
minimap2 -a reference.fa consensus.fa | samtools view -bS > alignment.bam
# Or simple comparison
diff <(grep -v "^>" reference.fa) <(grep -v "^>" consensus.fa) | head
```
### Count Changes
```bash
# Number of differences
bcftools view -H input.vcf.gz | wc -lCloud laboratory platform for automated protein testing and validation. Use when designing proteins and needing experimental validation including binding assays, expression testing, thermostability measurements, enzyme activity assays, or protein sequence optimization. Also use for submitting experiments via API, tracking experiment status, downloading results, optimizing protein sequences for better expression using computational tools (NetSolP, SoluProt, SolubleMPNN, ESM), or managing protein design workflows with wet-lab validation.
Time-blind friendly planning, executive function support, and daily structure for ADHD brains. Specializes in realistic time estimation, dopamine-aware task design, and building systems that
This skill should be used for time series machine learning tasks including classification, regression, clustering, forecasting, anomaly detection, segmentation, and similarity search. Use when working with temporal data, sequential patterns, or time-indexed observations requiring specialized algorithms beyond standard ML approaches. Particularly suited for univariate and multivariate time series analysis with scikit-learn compatible APIs.
Browse the web for any task — research topics, read articles, interact with web apps, fill forms, take screenshots, extract data, and test web pages. Use whenever a browser would be useful, not just when the user explicitly asks.
AI驱动的综合健康分析系统,整合多维度健康数据、识别异常模式、预测健康风险、提供个性化建议。支持智能问答和AI健康报告生成。
Access AlphaFold's 200M+ AI-predicted protein structures. Retrieve structures by UniProt ID, download PDB/mmCIF files, analyze confidence metrics (pLDDT, PAE), for drug discovery and structural biology.