Skill2.9k repo starsupdated 7d ago

bio-codon-usage

bio-codon-usage analyzes codon usage patterns in DNA sequences and calculates the Codon Adaptation Index (CAI) to measure how well a sequence's codon composition matches a reference set of highly expressed genes. Use this skill when optimizing coding sequences for heterologous protein expression in different organisms or when studying evolutionary patterns of codon selection bias across genes and species.

View source Repository: OpenClaw-Medical-Skills

Install in Claude Code

Copy

git clone --depth 1 https://github.com/FreedomIntelligence/OpenClaw-Medical-Skills /tmp/bio-codon-usage && cp -r /tmp/bio-codon-usage/skills/bio-codon-usage ~/.claude/skills/bio-codon-usage

Then start a new Claude Code session; the skill loads automatically.

Definition

SKILL.md

<!--
# COPYRIGHT NOTICE
# This file is part of the "Universal Biomedical Skills" project.
# Copyright (c) 2026 MD BABU MIA, PhD <md.babu.mia@mssm.edu>
# All Rights Reserved.
#
# This code is proprietary and confidential.
# Unauthorized copying of this file, via any medium is strictly prohibited.
#
# Provenance: Authenticated by MD BABU MIA

-->

---
name: bio-codon-usage
description: Analyze codon usage, calculate CAI (Codon Adaptation Index), and examine synonymous codon bias using Biopython. Use when analyzing coding sequences for expression optimization or evolutionary analysis.
tool_type: python
primary_tool: Bio.SeqUtils.CodonUsage
measurable_outcome: Execute skill workflow successfully with valid output within 15 minutes.
allowed-tools:
  - read_file
  - run_shell_command
---

# Codon Usage

Analyze codon usage patterns and calculate codon adaptation metrics using Biopython.

## Required Imports

```python
from Bio.Seq import Seq
from Bio.SeqUtils import GC123
from Bio.SeqUtils.CodonUsage import CodonAdaptationIndex
from Bio.Data import CodonTable
from collections import Counter
```

## Basic Codon Counting

### Count Codons in Sequence

```python
from collections import Counter

def count_codons(seq):
    seq_str = str(seq).upper()
    codons = [seq_str[i:i+3] for i in range(0, len(seq_str) - 2, 3)]
    return Counter(codons)

seq = Seq('ATGCGATCGATCGATCGTAA')
codon_counts = count_codons(seq)
```

### Codon Frequencies (Relative)

```python
def codon_frequencies(seq):
    counts = count_codons(seq)
    total = sum(counts.values())
    return {codon: count / total for codon, count in counts.items()}
```

## Codon Adaptation Index (CAI)

### Using CodonUsage Module

```python
from Bio.SeqUtils.CodonUsage import CodonAdaptationIndex

# Create CAI calculator with reference set
cai = CodonAdaptationIndex()

# Generate index from highly expressed genes
cai.generate_index('highly_expressed_genes.fasta')

# Calculate CAI for a sequence
seq = Seq('ATGCGATCGATCGATCGTAA')
cai_value = cai.cai_for_gene(str(seq))
print(f'CAI: {cai_value:.3f}')  # Range 0-1, higher = better adapted
```

### CAI with Custom Codon Index

```python
from Bio.SeqUtils.CodonUsage import CodonAdaptationIndex

cai = CodonAdaptationIndex()

# Set custom index (relative adaptiveness for each codon)
custom_index = {
    'TTT': 0.5, 'TTC': 1.0,  # Phe
    'TTA': 0.1, 'TTG': 0.5, 'CTT': 0.3, 'CTC': 1.0, 'CTA': 0.1, 'CTG': 1.0,  # Leu
    # ... define all 64 codons
}
cai.set_cai_index(custom_index)
```

## Synonymous Codon Usage

### RSCU (Relative Synonymous Codon Usage)

RSCU = (observed codon frequency) / (expected frequency if all synonymous codons were used equally)

```python
from Bio.Data import CodonTable

def calculate_rscu(seq, table_id=1):
    codon_table = CodonTable.unambiguous_dna_by_id[table_id]
    counts = count_codons(seq)

    # Group codons by amino acid
    aa_to_codons = {}
    for codon in counts:
        if codon in codon_table.stop_codons:
            continue
        try:
            aa = codon_table.forward_table[codon]
            aa_to_codons.setdefault(aa, []).append(codon)
        except KeyError:
            continue

    # Calculate RSCU for each codon
    rscu = {}
    for aa, codons in aa_to_codons.items():
        total = sum(counts.get(c, 0) for c in codons)
        n_synonymous = len(codons)
        expected = total / n_synonymous if n_synonymous > 0 else 0
        for codon in codons:
            observed = counts.get(codon, 0)
            rscu[codon] = observed / expected if expected > 0 else 0
    return rscu
```

### Identify Rare Codons

```python
def find_rare_codons(seq, threshold=0.1):
    freq = codon_frequencies(seq)
    return {codon: f for codon, f in freq.items() if f < threshold}
```

### Codon Bias by Position (GC123)

```python
from Bio.SeqUtils import GC123

seq = Seq('ATGCGATCGATCGATCGATCGATCGATCGTAA')
gc_total, gc_pos1, gc_pos2, gc_pos3 = GC123(seq)

print(f'Total GC: {gc_total:.1f}%')
print(f'1st position GC: {gc_pos1:.1f}%')
print(f'2nd position GC: {gc_pos2:.1f}%')
print(f'3rd position GC: {gc_pos3:.1f}% (wobble position)')
```

## Codon Tables

### Access Codon Tables

```python
from Bio.Data import CodonTable

# Get standard table
std_table = CodonTable.unambiguous_dna_by_id[1]

# List all available tables
for id, table in CodonTable.unambiguous_dna_by_id.items():
    print(f'{id}: {table.names[0]}')
```

### Common Codon Tables

| ID | Name | Organism |
|----|------|----------|
| 1 | Standard | Most organisms |
| 2 | Vertebrate Mitochondrial | Human, mouse mito |
| 4 | Mold Mitochondrial | Fungi, protozoa mito |
| 5 | Invertebrate Mitochondrial | Insects, worms mito |
| 11 | Bacterial/Plastid | E. coli, chloroplasts |

### Codon Table Properties

```python
table = CodonTable.unambiguous_dna_by_id[1]

print(f'Start codons: {table.start_codons}')
print(f'Stop codons: {table.stop_codons}')

# Forward table: codon -> amino acid
print(table.forward_table['ATG'])  # 'M'

# Back table: amino acid -> list of codons
back_table = {}
for codon, aa in table.forward_table.items():
    back_table.setdefault(aa, []).append(codon)
print(f'Leucine codons: {back_table["L"]}')
```

## Code Patterns

### Full Codon Usage Report

```python
def codon_usage_report(seq, table_id=1):
    from Bio.Data import CodonTable

    table = CodonTable.unambiguous_dna_by_id[table_id]
    counts = count_codons(seq)
    total = sum(counts.values())

    # Group by amino acid
    aa_groups = {}
    for codon, aa in table.forward_table.items():
        aa_groups.setdefault(aa, []).append(codon)

    report = {}
    for aa, codons in sorted(aa_groups.items()):
        aa_total = sum(counts.get(c, 0) for c in codons)
        report[aa] = {
            'total': aa_total,
            'codons': {c: {'count': counts.get(c, 0),
                          'freq': counts.get(c, 0) / aa_total if aa_total > 0 else 0}
                      for c in codons}
        }
    return report
```

### Compare Codon Usage Be

More from this repository

aav-vector-design-agentSkill

adaptyvSkill

Cloud laboratory platform for automated protein testing and validation. Use when designing proteins and needing experimental validation including binding assays, expression testing, thermostability measurements, enzyme activity assays, or protein sequence optimization. Also use for submitting experiments via API, tracking experiment status, downloading results, optimizing protein sequences for better expression using computational tools (NetSolP, SoluProt, SolubleMPNN, ESM), or managing protein design workflows with wet-lab validation.

adhd-daily-plannerSkill

Time-blind friendly planning, executive function support, and daily structure for ADHD brains. Specializes in realistic time estimation, dopamine-aware task design, and building systems that

aeonSkill

This skill should be used for time series machine learning tasks including classification, regression, clustering, forecasting, anomaly detection, segmentation, and similarity search. Use when working with temporal data, sequential patterns, or time-indexed observations requiring specialized algorithms beyond standard ML approaches. Particularly suited for univariate and multivariate time series analysis with scikit-learn compatible APIs.

agent-browserSkill

Browse the web for any task — research topics, read articles, interact with web apps, fill forms, take screenshots, extract data, and test web pages. Use whenever a browser would be useful, not just when the user explicitly asks.

agentd-drug-discoverySkill

ai-analyzerSkill

AI驱动的综合健康分析系统，整合多维度健康数据、识别异常模式、预测健康风险、提供个性化建议。支持智能问答和AI健康报告生成。

alphafold-databaseSkill

Access AlphaFold's 200M+ AI-predicted protein structures. Retrieve structures by UniProt ID, download PDB/mmCIF files, analyze confidence metrics (pLDDT, PAE), for drug discovery and structural biology.