Skill2.9k repo starsupdated 7d ago

bio-alignment-io

bio-alignment-io is a Claude Code skill that provides functions for reading, writing, and converting multiple sequence alignment files across formats including Clustal, PHYLIP, Stockholm, FASTA, Nexus, and MAF using Biopython's Bio.AlignIO module. Use this skill when working with phylogenetic or conservation analysis that requires parsing alignment files, converting between different alignment formats, or programmatically constructing and writing alignment data.

View source Repository: OpenClaw-Medical-Skills

Install in Claude Code

Copy

git clone --depth 1 https://github.com/FreedomIntelligence/OpenClaw-Medical-Skills /tmp/bio-alignment-io && cp -r /tmp/bio-alignment-io/skills/bio-alignment-io ~/.claude/skills/bio-alignment-io

Then start a new Claude Code session; the skill loads automatically.

Definition

SKILL.md

## Version Compatibility

Reference examples tested with: BioPython 1.83+

Before using code patterns, verify installed versions match. If versions differ:
- Python: `pip show <package>` then `help(module.function)` to check signatures

If code throws ImportError, AttributeError, or TypeError, introspect the installed
package and adapt the example to match the actual API rather than retrying.

# Alignment File I/O

Read, write, and convert multiple sequence alignment files in various formats.

## Required Import

**Goal:** Load modules for reading, writing, and manipulating multiple sequence alignments.

**Approach:** Import AlignIO for file I/O and supporting classes for programmatic alignment construction.

```python
from Bio import AlignIO
from Bio.Align import MultipleSeqAlignment
from Bio.SeqRecord import SeqRecord
from Bio.Seq import Seq
```

## Supported Formats

| Format | Extension | Read | Write | Description |
|--------|-----------|------|-------|-------------|
| `clustal` | .aln | Yes | Yes | Clustal W/X output |
| `fasta` | .fasta, .fa | Yes | Yes | Aligned FASTA |
| `phylip` | .phy | Yes | Yes | Interleaved PHYLIP |
| `phylip-sequential` | .phy | Yes | Yes | Sequential PHYLIP |
| `phylip-relaxed` | .phy | Yes | Yes | PHYLIP with long names |
| `stockholm` | .sto, .stk | Yes | Yes | Pfam/Rfam annotated |
| `nexus` | .nex | Yes | Yes | NEXUS format |
| `emboss` | .txt | Yes | No | EMBOSS tools output |
| `fasta-m10` | .txt | Yes | No | FASTA -m 10 output |
| `maf` | .maf | Yes | Yes | Multiple Alignment Format |
| `mauve` | .xmfa | Yes | No | progressiveMauve output |
| `msf` | .msf | Yes | No | GCG MSF format |

## Reading Alignments

**"Read an alignment file"** → Parse an alignment file into an alignment object with sequences and metadata accessible.

**Goal:** Load alignment data from files in various formats (Clustal, PHYLIP, Stockholm, FASTA).

**Approach:** Use `AlignIO.read()` for single-alignment files or `AlignIO.parse()` for files containing multiple alignments.

### Single Alignment File
```python
from Bio import AlignIO

alignment = AlignIO.read('alignment.aln', 'clustal')
print(f'Alignment length: {alignment.get_alignment_length()}')
print(f'Number of sequences: {len(alignment)}')
```

### Multiple Alignments in One File
```python
for alignment in AlignIO.parse('multi_alignment.sto', 'stockholm'):
    print(f'Alignment with {len(alignment)} sequences, length {alignment.get_alignment_length()}')
```

### Read as List
```python
alignments = list(AlignIO.parse('alignments.phy', 'phylip'))
print(f'Read {len(alignments)} alignments')
```

## Writing Alignments

**Goal:** Save alignment data to files in standard formats for downstream tools or archival.

**Approach:** Use `AlignIO.write()` with the target format specifier, supporting single or multiple alignments and file handles.

### Write Single Alignment
```python
AlignIO.write(alignment, 'output.fasta', 'fasta')
```

### Write Multiple Alignments
```python
alignments = [alignment1, alignment2, alignment3]
count = AlignIO.write(alignments, 'output.sto', 'stockholm')
print(f'Wrote {count} alignments')
```

### Write to Handle
```python
with open('output.aln', 'w') as handle:
    AlignIO.write(alignment, handle, 'clustal')
```

## Format Conversion

**"Convert alignment format"** → Transform an alignment file from one format to another (e.g., Clustal to PHYLIP).

**Goal:** Convert alignment files between formats for compatibility with different analysis tools.

**Approach:** Use `AlignIO.convert()` for direct one-step conversion, or read-modify-write for cases requiring intermediate manipulation.

### Direct Conversion (Most Efficient)
```python
AlignIO.convert('input.aln', 'clustal', 'output.phy', 'phylip')
```

### With Alphabet Specification
```python
AlignIO.convert('input.sto', 'stockholm', 'output.nex', 'nexus', molecule_type='DNA')
```

### Manual Conversion (When Modification Needed)
```python
alignment = AlignIO.read('input.aln', 'clustal')
# ... modify alignment ...
AlignIO.write(alignment, 'output.fasta', 'fasta')
```

## Accessing Alignment Data

**Goal:** Navigate and extract data from alignment objects including sequences, columns, and slices.

**Approach:** Use iteration, indexing, and column slicing on the alignment object.

```python
alignment = AlignIO.read('alignment.aln', 'clustal')

# Iterate over sequences
for record in alignment:
    print(f'{record.id}: {record.seq}')

# Access by index
first_seq = alignment[0]
last_seq = alignment[-1]

# Slice columns
column_slice = alignment[:, 10:20]  # Columns 10-19

# Get specific column
column = alignment[:, 5]  # Column 5 as string
```

## Working with Alignment Objects

### Get Alignment Properties
```python
alignment = AlignIO.read('alignment.aln', 'clustal')

length = alignment.get_alignment_length()
num_seqs = len(alignment)
seq_ids = [record.id for record in alignment]
```

### Slice Alignments
```python
# Get subset of sequences
subset = alignment[0:5]  # First 5 sequences

# Get subset of columns
trimmed = alignment[:, 50:150]  # Columns 50-149

# Combine slicing
region = alignment[0:5, 50:150]  # 5 sequences, columns 50-149
```

## Creating Alignments Programmatically

**Goal:** Build an alignment object from sequences defined in code rather than read from a file.

**Approach:** Construct SeqRecord objects with gap characters and wrap them in a MultipleSeqAlignment.

```python
from Bio.Align import MultipleSeqAlignment
from Bio.SeqRecord import SeqRecord
from Bio.Seq import Seq

records = [
    SeqRecord(Seq('ACTGACTGACTG'), id='seq1'),
    SeqRecord(Seq('ACTGACT-ACTG'), id='seq2'),
    SeqRecord(Seq('ACTG-CTGACTG'), id='seq3'),
]
alignment = MultipleSeqAlignment(records)
AlignIO.write(alignment, 'new_alignment.fasta', 'fasta')
```

## Format-Specific Notes

### PHYLIP Format
```python
# Standard PHYLIP (10 char names, interleaved)
alignment = AlignIO.read('file.phy', 'phylip')

# Sequential PHYLIP
alignment = AlignIO.read('file.phy', 'phylip-

More from this repository

aav-vector-design-agentSkill

adaptyvSkill

Cloud laboratory platform for automated protein testing and validation. Use when designing proteins and needing experimental validation including binding assays, expression testing, thermostability measurements, enzyme activity assays, or protein sequence optimization. Also use for submitting experiments via API, tracking experiment status, downloading results, optimizing protein sequences for better expression using computational tools (NetSolP, SoluProt, SolubleMPNN, ESM), or managing protein design workflows with wet-lab validation.

adhd-daily-plannerSkill

Time-blind friendly planning, executive function support, and daily structure for ADHD brains. Specializes in realistic time estimation, dopamine-aware task design, and building systems that

aeonSkill

This skill should be used for time series machine learning tasks including classification, regression, clustering, forecasting, anomaly detection, segmentation, and similarity search. Use when working with temporal data, sequential patterns, or time-indexed observations requiring specialized algorithms beyond standard ML approaches. Particularly suited for univariate and multivariate time series analysis with scikit-learn compatible APIs.

agent-browserSkill

Browse the web for any task — research topics, read articles, interact with web apps, fill forms, take screenshots, extract data, and test web pages. Use whenever a browser would be useful, not just when the user explicitly asks.

agentd-drug-discoverySkill

ai-analyzerSkill

AI驱动的综合健康分析系统，整合多维度健康数据、识别异常模式、预测健康风险、提供个性化建议。支持智能问答和AI健康报告生成。

alphafold-databaseSkill

Access AlphaFold's 200M+ AI-predicted protein structures. Retrieve structures by UniProt ID, download PDB/mmCIF files, analyze confidence metrics (pLDDT, PAE), for drug discovery and structural biology.