bio-comparative-genomics-positive-selection
This Claude Code skill detects positive selection in genes using codon substitution analysis through PAML codeml and HyPhy tools. It identifies sites and branches undergoing adaptive evolution by calculating dN/dS ratios and applying codon substitution models. Use this skill when testing whether specific genes or gene sites show evidence of adaptive selection, particularly in immune genes, reproductive proteins, or when comparing gene families across species.
git clone --depth 1 https://github.com/FreedomIntelligence/OpenClaw-Medical-Skills /tmp/bio-comparative-genomics-positive-selection && cp -r /tmp/bio-comparative-genomics-positive-selection/skills/bio-comparative-genomics-positive-selection ~/.claude/skills/bio-comparative-genomics-positive-selectionSKILL.md
<!--
# COPYRIGHT NOTICE
# This file is part of the "Universal Biomedical Skills" project.
# Copyright (c) 2026 MD BABU MIA, PhD <md.babu.mia@mssm.edu>
# All Rights Reserved.
#
# This code is proprietary and confidential.
# Unauthorized copying of this file, via any medium is strictly prohibited.
#
# Provenance: Authenticated by MD BABU MIA
-->
---
name: bio-comparative-genomics-positive-selection
description: Detect positive selection using dN/dS (omega) tests with PAML codeml and HyPhy. Identify sites and branches under adaptive evolution through codon models and branch-site tests. Use when testing for adaptive evolution in gene families or identifying positively selected sites.
tool_type: mixed
primary_tool: PAML
measurable_outcome: Execute skill workflow successfully with valid output within 15 minutes.
allowed-tools:
- read_file
- run_shell_command
---
# Positive Selection Analysis
## dN/dS Overview
```python
'''
dN/dS (omega, ω) interpretation:
- ω < 1: Purifying (negative) selection - deleterious mutations removed
- ω = 1: Neutral evolution - no selective pressure
- ω > 1: Positive (diversifying) selection - advantageous mutations favored
Most genes: ω << 1 (strong purifying selection)
Immune genes, reproduction: Often show ω > 1 at specific sites
'''
```
## PAML Codeml Analysis
```python
'''Run PAML codeml for selection analysis'''
import subprocess
import os
from Bio import SeqIO
from Bio.Seq import Seq
def prepare_codon_alignment(cds_fasta, output_phy):
'''Prepare codon alignment in PHYLIP format
Requirements:
- CDS sequences (in-frame, no stop codons except terminal)
- Multiple sequence alignment already performed
- Sequence length divisible by 3
'''
records = list(SeqIO.parse(cds_fasta, 'fasta'))
# Validate codon alignment
for rec in records:
if len(rec.seq) % 3 != 0:
print(f'Warning: {rec.id} length not divisible by 3')
# Write PHYLIP format
n_seq = len(records)
seq_len = len(records[0].seq)
with open(output_phy, 'w') as f:
f.write(f' {n_seq} {seq_len}\n')
for rec in records:
# PHYLIP names: 10 characters, padded
name = rec.id[:10].ljust(10)
f.write(f'{name}{str(rec.seq)}\n')
return output_phy
def create_codeml_control(alignment_file, tree_file, output_dir, model='M8'):
'''Create codeml control file
Site models for detecting positive selection:
- M0: One ratio (single ω for all sites)
- M1a: Nearly neutral (ω0 < 1, ω1 = 1)
- M2a: Positive selection (ω0 < 1, ω1 = 1, ω2 > 1)
- M7: Beta (ω from beta distribution, 0 < ω < 1)
- M8: Beta + ω > 1 (allows positive selection)
- M8a: Beta + ω = 1 (null for M8 comparison)
Standard comparison: M8 vs M7 or M8 vs M8a
'''
model_params = {
'M0': {'NSsites': 0, 'model': 0},
'M1a': {'NSsites': 1, 'model': 0},
'M2a': {'NSsites': 2, 'model': 0},
'M7': {'NSsites': 7, 'model': 0},
'M8': {'NSsites': 8, 'model': 0},
'M8a': {'NSsites': 8, 'model': 0, 'fix_omega': 1, 'omega': 1},
}
params = model_params.get(model, model_params['M8'])
ctl_content = f'''
seqfile = {alignment_file}
treefile = {tree_file}
outfile = {output_dir}/mlc
noisy = 9
verbose = 1
runmode = 0
seqtype = 1
CodonFreq = 2
model = {params.get('model', 0)}
NSsites = {params.get('NSsites', 8)}
icode = 0
fix_kappa = 0
kappa = 2
fix_omega = {params.get('fix_omega', 0)}
omega = {params.get('omega', 1)}
'''
ctl_file = f'{output_dir}/codeml_{model}.ctl'
with open(ctl_file, 'w') as f:
f.write(ctl_content)
return ctl_file
def run_codeml(ctl_file):
'''Run PAML codeml'''
cmd = f'codeml {ctl_file}'
result = subprocess.run(cmd, shell=True, capture_output=True, text=True)
if result.returncode != 0:
print(f'Codeml error: {result.stderr}')
return result
def parse_codeml_output(mlc_file):
'''Parse codeml output for likelihood and parameters'''
results = {'lnL': None, 'omega': None, 'kappa': None, 'sites': []}
with open(mlc_file) as f:
content = f.read()
# Extract log-likelihood
for line in content.split('\n'):
if 'lnL' in line and 'np' in line:
parts = line.split()
for i, p in enumerate(parts):
if p == 'lnL':
results['lnL'] = float(parts[i + 2])
break
# Extract omega values
if 'omega' in line.lower() and '=' in line:
parts = line.split('=')
if len(parts) >= 2:
try:
results['omega'] = float(parts[-1].strip().split()[0])
except ValueError:
pass
# Extract positively selected sites (BEB analysis)
if 'Bayes Empirical Bayes' in content:
beb_section = content.split('Bayes Empirical Bayes')[1]
for line in beb_section.split('\n'):
parts = line.split()
if len(parts) >= 5:
try:
site = int(parts[0])
aa = parts[1]
prob = float(parts[2])
# Sites with P > 0.95 considered significant
# Sites with P > 0.99 highly significant
if prob > 0.95:
results['sites'].append({
'position': site,
'amino_acid': aa,
'probability': prob,
'significance': '**' if prob > 0.99 else '*'
})
except (ValueError, IndexError):
continue
return results
def likelihood_ratio_test(lnL_null, lnL_alt, df=2):
'''Perform likelihood ratio test
For M8 vs M7: df = 2
For M2a vs M1a: df = 2
For branch-site test: df = 1
SignificCloud laboratory platform for automated protein testing and validation. Use when designing proteins and needing experimental validation including binding assays, expression testing, thermostability measurements, enzyme activity assays, or protein sequence optimization. Also use for submitting experiments via API, tracking experiment status, downloading results, optimizing protein sequences for better expression using computational tools (NetSolP, SoluProt, SolubleMPNN, ESM), or managing protein design workflows with wet-lab validation.
Time-blind friendly planning, executive function support, and daily structure for ADHD brains. Specializes in realistic time estimation, dopamine-aware task design, and building systems that
This skill should be used for time series machine learning tasks including classification, regression, clustering, forecasting, anomaly detection, segmentation, and similarity search. Use when working with temporal data, sequential patterns, or time-indexed observations requiring specialized algorithms beyond standard ML approaches. Particularly suited for univariate and multivariate time series analysis with scikit-learn compatible APIs.
Browse the web for any task — research topics, read articles, interact with web apps, fill forms, take screenshots, extract data, and test web pages. Use whenever a browser would be useful, not just when the user explicitly asks.
AI驱动的综合健康分析系统,整合多维度健康数据、识别异常模式、预测健康风险、提供个性化建议。支持智能问答和AI健康报告生成。
Access AlphaFold's 200M+ AI-predicted protein structures. Retrieve structures by UniProt ID, download PDB/mmCIF files, analyze confidence metrics (pLDDT, PAE), for drug discovery and structural biology.