Skill286 estrellas del repoactualizado 5d ago

biopython-molecular-biology

This Claude Code skill provides Biopython, a Python library for computational molecular biology that handles sequence manipulation, file format conversion (FASTA, GenBank, PDB), NCBI database queries, BLAST automation, sequence alignment, protein structure analysis, and phylogenetic tree construction. Use it for batch processing biological sequences, custom bioinformatics pipelines, format conversions, and complex analyses requiring programmatic control over multiple molecular biology tools.

Ver fuente Repositorio: SciAgent-Skills

Instalar en Claude Code

Copiar

git clone --depth 1 https://github.com/jaechang-hits/SciAgent-Skills /tmp/biopython-molecular-biology && cp -r /tmp/biopython-molecular-biology/skills/genomics-bioinformatics/biopython-molecular-biology ~/.claude/skills/biopython-molecular-biology

Después abre una sesión nueva de Claude Code; el skill carga automáticamente.

Definición

SKILL.md

# Biopython: Computational Molecular Biology Toolkit

## Overview

Biopython is the standard open-source Python library for computational molecular biology, providing modular APIs for sequence handling, biological file parsing, NCBI database access, BLAST searches, protein structure analysis, and phylogenetics. It supports Python 3 and requires NumPy.

## When to Use

- Parse and convert biological file formats (FASTA, GenBank, FASTQ, PDB, mmCIF, PHYLIP)
- Fetch sequences or publications from NCBI databases (GenBank, PubMed, Protein) programmatically
- Run and parse BLAST searches (remote NCBI or local BLAST+)
- Perform pairwise or multiple sequence alignments with custom scoring
- Analyze 3D protein structures — distances, angles, DSSP, superimposition
- Build and visualize phylogenetic trees from sequence alignments
- Calculate sequence statistics (GC content, molecular weight, melting temperature)
- Batch-process thousands of sequences with custom filtering logic
- Use `pysam` instead for reading SAM/BAM/CRAM alignment files and working with mapped reads; use `scikit-bio` instead for advanced ecological diversity metrics

## Prerequisites

- **Python packages**: `biopython`, `numpy`, `matplotlib` (for tree visualization)
- **Data requirements**: Sequence files (FASTA, GenBank, FASTQ) or accession IDs for NCBI access
- **Environment**: Python 3.8+; NCBI Entrez requires email registration

```bash
pip install biopython numpy matplotlib
```

## Quick Start

```python
from Bio import SeqIO
from Bio.Seq import Seq
from Bio.SeqUtils import gc_fraction

# Parse a FASTA file and compute basic statistics
records = list(SeqIO.parse("sequences.fasta", "fasta"))
print(f"Sequences loaded: {len(records)}")

seq = records[0].seq
print(f"ID: {records[0].id}")
print(f"Length: {len(seq)} bp")
print(f"GC content: {gc_fraction(seq)*100:.1f}%")
print(f"Reverse complement: {seq.reverse_complement()[:30]}...")
print(f"Protein translation: {seq.translate()[:10]}...")
```

## Core API

### Module 1: Sequence Objects (Bio.Seq)

Create and manipulate DNA, RNA, and protein sequences.

```python
from Bio.Seq import Seq

# Create sequence and perform standard operations
dna = Seq("ATGGCCATTGTAATGGGCCGCTGAAAGGGTGCCCGATAG")
print(f"Length: {len(dna)} bp")
print(f"Complement: {dna.complement()}")
print(f"Reverse complement: {dna.reverse_complement()}")
print(f"Transcription: {dna.transcribe()}")
print(f"Translation: {dna.translate()}")
print(f"Translation (to stop): {dna.translate(to_stop=True)}")
# Length: 39 bp
# Translation: MAIVMGR*KGAR*
# Translation (to stop): MAIVMGR
```

```python
from Bio.Seq import Seq

# Alternative genetic codes (e.g., mitochondrial)
mito_dna = Seq("ATGGCCATTGTAATGGGCCGCTGA")
std_protein = mito_dna.translate(table=1)      # Standard
mito_protein = mito_dna.translate(table=2)     # Vertebrate mitochondrial
print(f"Standard:      {std_protein}")
print(f"Mitochondrial: {mito_protein}")

# Find all start codons
coding_dna = Seq("ATGAAACCCATGGGGTTTAAATAG")
positions = [i for i in range(len(coding_dna) - 2) if coding_dna[i:i+3] == "ATG"]
print(f"ATG positions: {positions}")
# ATG positions: [0, 9]
```

### Module 2: Sequence I/O (Bio.SeqIO)

Read, write, and convert biological file formats.

```python
from Bio import SeqIO

# Parse FASTA file — returns SeqRecord iterator
records = list(SeqIO.parse("sequences.fasta", "fasta"))
print(f"Loaded {len(records)} sequences")
for rec in records[:3]:
    print(f"  {rec.id}: {len(rec.seq)} bp — {rec.description}")

# Parse GenBank — rich annotation access
for rec in SeqIO.parse("genome.gb", "genbank"):
    print(f"{rec.id}: {len(rec.features)} features, {len(rec.seq)} bp")
    for feat in rec.features[:5]:
        print(f"  {feat.type}: {feat.location}")

# Convert between formats
count = SeqIO.convert("input.gb", "genbank", "output.fasta", "fasta")
print(f"Converted {count} records: GenBank → FASTA")
```

```python
from Bio import SeqIO
from Bio.SeqRecord import SeqRecord
from Bio.Seq import Seq

# Write sequences to file
records = [
    SeqRecord(Seq("ATCGATCG"), id="seq1", description="test sequence 1"),
    SeqRecord(Seq("GCTAGCTA"), id="seq2", description="test sequence 2"),
]
count = SeqIO.write(records, "output.fasta", "fasta")
print(f"Wrote {count} records to output.fasta")

# Filter sequences by length (streaming — memory efficient)
long_seqs = (rec for rec in SeqIO.parse("large_file.fasta", "fasta") if len(rec.seq) >= 200)
count = SeqIO.write(long_seqs, "filtered.fasta", "fasta")
print(f"Kept {count} sequences >= 200 bp")

# Index large FASTA for random access
idx = SeqIO.index("large_file.fasta", "fasta")
print(f"Indexed {len(idx)} sequences")
rec = idx["target_sequence_id"]
print(f"Retrieved: {rec.id}, {len(rec.seq)} bp")
```

### Module 3: NCBI Database Access (Bio.Entrez)

Programmatic search and download from NCBI databases.

```python
from Bio import Entrez, SeqIO

Entrez.email = "your.email@example.com"
# Entrez.api_key = "your_key"  # Optional: 10 req/s instead of 3 req/s

# Search PubMed
handle = Entrez.esearch(db="pubmed", term="CRISPR Cas9 2024", retmax=5)
results = Entrez.read(handle)
handle.close()
print(f"Found {results['Count']} articles, retrieved {len(results['IdList'])} IDs")
print(f"IDs: {results['IdList']}")

# Fetch GenBank record by accession
handle = Entrez.efetch(db="nucleotide", id="EU490707", rettype="gb", retmode="text")
record = SeqIO.read(handle, "genbank")
handle.close()
print(f"{record.id}: {record.description}")
print(f"Length: {len(record.seq)} bp, Features: {len(record.features)}")
```

```python
from Bio import Entrez
import time

Entrez.email = "your.email@example.com"

# Batch download with rate limiting
handle = Entrez.esearch(db="protein", term="insulin[Protein Name] AND human[Organism]", retmax=20)
results = Entrez.read(handle)
handle.close()

# Fetch summaries in batch
ids = results["IdList"][:10]
handle = Entrez.esummary(db="protein", id=",".join(ids))
summaries = Entrez.read(handle)
handle.close()

for

Del mismo repositorio

sciagent-skill-creatorSkill

opentrons-integrationSkill

Opentrons Protocol API v2 for OT-2/Flex: Python protocols for pipetting, serial dilutions, PCR, plate replication; control thermocycler, heater-shaker, magnetic, temperature modules. Use pylabrobot for multi-vendor.

plotly-interactive-visualizationSkill

Interactive visualization with Plotly. 40+ chart types (scatter, line, heatmap, 3D, geographic) with hover, zoom, pan. Two APIs: Plotly Express (DataFrame) and Graph Objects (fine control). For static publication figures use matplotlib; for statistical grammar use seaborn.

seaborn-statistical-visualizationSkill

Statistical visualization on matplotlib + pandas. Distributions (histplot, kdeplot, violin, box), relational (scatter, line), categorical, regression, correlation heatmaps. Auto aggregation/CIs. Use plotly for interactive; matplotlib for low-level.

single-cell-annotationSkill

Best practices for single-cell RNA-seq cell type annotation including marker-based, reference-based, and automated classification approaches.

pymc-bayesian-modelingSkill

Bayesian modeling with PyMC 5: priors, likelihood, NUTS/ADVI sampling, diagnostics (R-hat, ESS), LOO/WAIC comparison, prediction. Hierarchical, logistic, GP variants; predictive checks.

scikit-survival-analysisSkill

Time-to-event modeling with scikit-survival: Cox PH (elastic net), Random Survival Forests, Boosting, SVMs for censored data. C-index, Brier, time-dependent AUC; Kaplan-Meier, Nelson-Aalen, competing risks. Pipeline/GridSearchCV compatible. Use statsmodels for frequentist, pymc for Bayesian, lifelines for parametric.

statistical-analysisSkill