Skip to main content
ClaudeWave
Skill2.7k estrellas del repoactualizado 2mo ago

bio-comparative-genomics-positive-selection

This Claude Code skill detects positive selection in genes using codon substitution analysis through PAML codeml and HyPhy tools. It identifies sites and branches undergoing adaptive evolution by calculating dN/dS ratios and applying codon substitution models. Use this skill when testing whether specific genes or gene sites show evidence of adaptive selection, particularly in immune genes, reproductive proteins, or when comparing gene families across species.

Instalar en Claude Code
Copiar
git clone --depth 1 https://github.com/FreedomIntelligence/OpenClaw-Medical-Skills /tmp/bio-comparative-genomics-positive-selection && cp -r /tmp/bio-comparative-genomics-positive-selection/skills/bio-comparative-genomics-positive-selection ~/.claude/skills/bio-comparative-genomics-positive-selection
Después abre una sesión nueva de Claude Code; el skill carga automáticamente.

SKILL.md

<!--
# COPYRIGHT NOTICE
# This file is part of the "Universal Biomedical Skills" project.
# Copyright (c) 2026 MD BABU MIA, PhD <md.babu.mia@mssm.edu>
# All Rights Reserved.
#
# This code is proprietary and confidential.
# Unauthorized copying of this file, via any medium is strictly prohibited.
#
# Provenance: Authenticated by MD BABU MIA

-->

---
name: bio-comparative-genomics-positive-selection
description: Detect positive selection using dN/dS (omega) tests with PAML codeml and HyPhy. Identify sites and branches under adaptive evolution through codon models and branch-site tests. Use when testing for adaptive evolution in gene families or identifying positively selected sites.
tool_type: mixed
primary_tool: PAML
measurable_outcome: Execute skill workflow successfully with valid output within 15 minutes.
allowed-tools:
  - read_file
  - run_shell_command
---

# Positive Selection Analysis

## dN/dS Overview

```python
'''
dN/dS (omega, ω) interpretation:
- ω < 1: Purifying (negative) selection - deleterious mutations removed
- ω = 1: Neutral evolution - no selective pressure
- ω > 1: Positive (diversifying) selection - advantageous mutations favored

Most genes: ω << 1 (strong purifying selection)
Immune genes, reproduction: Often show ω > 1 at specific sites
'''
```

## PAML Codeml Analysis

```python
'''Run PAML codeml for selection analysis'''

import subprocess
import os
from Bio import SeqIO
from Bio.Seq import Seq


def prepare_codon_alignment(cds_fasta, output_phy):
    '''Prepare codon alignment in PHYLIP format

    Requirements:
    - CDS sequences (in-frame, no stop codons except terminal)
    - Multiple sequence alignment already performed
    - Sequence length divisible by 3
    '''
    records = list(SeqIO.parse(cds_fasta, 'fasta'))

    # Validate codon alignment
    for rec in records:
        if len(rec.seq) % 3 != 0:
            print(f'Warning: {rec.id} length not divisible by 3')

    # Write PHYLIP format
    n_seq = len(records)
    seq_len = len(records[0].seq)

    with open(output_phy, 'w') as f:
        f.write(f' {n_seq} {seq_len}\n')
        for rec in records:
            # PHYLIP names: 10 characters, padded
            name = rec.id[:10].ljust(10)
            f.write(f'{name}{str(rec.seq)}\n')

    return output_phy


def create_codeml_control(alignment_file, tree_file, output_dir, model='M8'):
    '''Create codeml control file

    Site models for detecting positive selection:
    - M0: One ratio (single ω for all sites)
    - M1a: Nearly neutral (ω0 < 1, ω1 = 1)
    - M2a: Positive selection (ω0 < 1, ω1 = 1, ω2 > 1)
    - M7: Beta (ω from beta distribution, 0 < ω < 1)
    - M8: Beta + ω > 1 (allows positive selection)
    - M8a: Beta + ω = 1 (null for M8 comparison)

    Standard comparison: M8 vs M7 or M8 vs M8a
    '''
    model_params = {
        'M0': {'NSsites': 0, 'model': 0},
        'M1a': {'NSsites': 1, 'model': 0},
        'M2a': {'NSsites': 2, 'model': 0},
        'M7': {'NSsites': 7, 'model': 0},
        'M8': {'NSsites': 8, 'model': 0},
        'M8a': {'NSsites': 8, 'model': 0, 'fix_omega': 1, 'omega': 1},
    }

    params = model_params.get(model, model_params['M8'])

    ctl_content = f'''
      seqfile = {alignment_file}
     treefile = {tree_file}
      outfile = {output_dir}/mlc

        noisy = 9
      verbose = 1
      runmode = 0
      seqtype = 1
    CodonFreq = 2
        model = {params.get('model', 0)}
      NSsites = {params.get('NSsites', 8)}
        icode = 0
    fix_kappa = 0
        kappa = 2
    fix_omega = {params.get('fix_omega', 0)}
        omega = {params.get('omega', 1)}
    '''

    ctl_file = f'{output_dir}/codeml_{model}.ctl'
    with open(ctl_file, 'w') as f:
        f.write(ctl_content)

    return ctl_file


def run_codeml(ctl_file):
    '''Run PAML codeml'''
    cmd = f'codeml {ctl_file}'
    result = subprocess.run(cmd, shell=True, capture_output=True, text=True)

    if result.returncode != 0:
        print(f'Codeml error: {result.stderr}')

    return result


def parse_codeml_output(mlc_file):
    '''Parse codeml output for likelihood and parameters'''
    results = {'lnL': None, 'omega': None, 'kappa': None, 'sites': []}

    with open(mlc_file) as f:
        content = f.read()

    # Extract log-likelihood
    for line in content.split('\n'):
        if 'lnL' in line and 'np' in line:
            parts = line.split()
            for i, p in enumerate(parts):
                if p == 'lnL':
                    results['lnL'] = float(parts[i + 2])
                    break

        # Extract omega values
        if 'omega' in line.lower() and '=' in line:
            parts = line.split('=')
            if len(parts) >= 2:
                try:
                    results['omega'] = float(parts[-1].strip().split()[0])
                except ValueError:
                    pass

    # Extract positively selected sites (BEB analysis)
    if 'Bayes Empirical Bayes' in content:
        beb_section = content.split('Bayes Empirical Bayes')[1]
        for line in beb_section.split('\n'):
            parts = line.split()
            if len(parts) >= 5:
                try:
                    site = int(parts[0])
                    aa = parts[1]
                    prob = float(parts[2])
                    # Sites with P > 0.95 considered significant
                    # Sites with P > 0.99 highly significant
                    if prob > 0.95:
                        results['sites'].append({
                            'position': site,
                            'amino_acid': aa,
                            'probability': prob,
                            'significance': '**' if prob > 0.99 else '*'
                        })
                except (ValueError, IndexError):
                    continue

    return results


def likelihood_ratio_test(lnL_null, lnL_alt, df=2):
    '''Perform likelihood ratio test

    For M8 vs M7: df = 2
    For M2a vs M1a: df = 2
    For branch-site test: df = 1

    Signific
aav-vector-design-agentSkill
adaptyvSkill

Cloud laboratory platform for automated protein testing and validation. Use when designing proteins and needing experimental validation including binding assays, expression testing, thermostability measurements, enzyme activity assays, or protein sequence optimization. Also use for submitting experiments via API, tracking experiment status, downloading results, optimizing protein sequences for better expression using computational tools (NetSolP, SoluProt, SolubleMPNN, ESM), or managing protein design workflows with wet-lab validation.

adhd-daily-plannerSkill

Time-blind friendly planning, executive function support, and daily structure for ADHD brains. Specializes in realistic time estimation, dopamine-aware task design, and building systems that

aeonSkill

This skill should be used for time series machine learning tasks including classification, regression, clustering, forecasting, anomaly detection, segmentation, and similarity search. Use when working with temporal data, sequential patterns, or time-indexed observations requiring specialized algorithms beyond standard ML approaches. Particularly suited for univariate and multivariate time series analysis with scikit-learn compatible APIs.

agent-browserSkill

Browse the web for any task — research topics, read articles, interact with web apps, fill forms, take screenshots, extract data, and test web pages. Use whenever a browser would be useful, not just when the user explicitly asks.

agentd-drug-discoverySkill
ai-analyzerSkill

AI驱动的综合健康分析系统,整合多维度健康数据、识别异常模式、预测健康风险、提供个性化建议。支持智能问答和AI健康报告生成。

alphafold-databaseSkill

Access AlphaFold's 200M+ AI-predicted protein structures. Retrieve structures by UniProt ID, download PDB/mmCIF files, analyze confidence metrics (pLDDT, PAE), for drug discovery and structural biology.