Skill2.9k estrellas del repoactualizado 8d ago

bio-blast-searches

Bio-blast-searches executes BLAST (Basic Local Alignment Search Tool) queries against NCBI databases using Biopython to identify unknown DNA, RNA, or protein sequences and find homologous matches. Use this skill when comparing biological sequences to public sequence databases for similarity searches, sequence identification, or finding related sequences across nucleotide or protein repositories.

Ver fuente Repositorio: OpenClaw-Medical-Skills

Instalar en Claude Code

Copiar

git clone --depth 1 https://github.com/FreedomIntelligence/OpenClaw-Medical-Skills /tmp/bio-blast-searches && cp -r /tmp/bio-blast-searches/skills/bio-blast-searches ~/.claude/skills/bio-blast-searches

Después abre una sesión nueva de Claude Code; el skill carga automáticamente.

Definición

SKILL.md

<!--
# COPYRIGHT NOTICE
# This file is part of the "Universal Biomedical Skills" project.
# Copyright (c) 2026 MD BABU MIA, PhD <md.babu.mia@mssm.edu>
# All Rights Reserved.
#
# This code is proprietary and confidential.
# Unauthorized copying of this file, via any medium is strictly prohibited.
#
# Provenance: Authenticated by MD BABU MIA

-->

---
name: bio-blast-searches
description: Run remote BLAST searches against NCBI databases using Biopython Bio.Blast. Use when identifying unknown sequences, finding homologs, or searching for sequence similarity against NCBI's nr/nt databases.
tool_type: python
primary_tool: Bio.Blast.NCBIWWW
measurable_outcome: Execute skill workflow successfully with valid output within 15 minutes.
allowed-tools:
  - read_file
  - run_shell_command
---

# BLAST Searches

Run BLAST searches against NCBI databases using Biopython's Bio.Blast module.

## Required Import

```python
from Bio.Blast import NCBIWWW, NCBIXML
from Bio import SeqIO
```

## BLAST Programs

| Program | Query | Database | Use Case |
|---------|-------|----------|----------|
| `blastn` | Nucleotide | Nucleotide | DNA/RNA sequence similarity |
| `blastp` | Protein | Protein | Protein sequence similarity |
| `blastx` | Nucleotide | Protein | Find protein hits for DNA query |
| `tblastn` | Protein | Nucleotide | Find DNA encoding protein-like |
| `tblastx` | Nucleotide | Nucleotide | Translated vs translated |

## Core Function

### NCBIWWW.qblast()

Submit a BLAST query to NCBI servers.

```python
from Bio.Blast import NCBIWWW

# Simple BLASTN search
result_handle = NCBIWWW.qblast('blastn', 'nt', sequence)
```

**Key Parameters:**
| Parameter | Description | Example |
|-----------|-------------|---------|
| `program` | BLAST program | `'blastn'`, `'blastp'` |
| `database` | Target database | `'nr'`, `'nt'`, `'refseq_rna'` |
| `sequence` | Query sequence | String or SeqRecord |
| `entrez_query` | Limit by Entrez query | `'Homo sapiens[organism]'` |
| `hitlist_size` | Max hits to return | `50` |
| `expect` | E-value threshold | `0.001` |
| `word_size` | Word size | `11` for blastn |
| `gapcosts` | Gap penalties | `'5 2'` (open, extend) |
| `format_type` | Output format | `'XML'` (default), `'Text'` |

## Common Databases

**Nucleotide:**
| Database | Description |
|----------|-------------|
| `nt` | All GenBank + EMBL + DDBJ |
| `refseq_rna` | RefSeq RNA sequences |
| `refseq_genomic` | RefSeq genomic sequences |

**Protein:**
| Database | Description |
|----------|-------------|
| `nr` | Non-redundant protein |
| `refseq_protein` | RefSeq proteins |
| `swissprot` | SwissProt (curated) |
| `pdb` | Protein structures |

## Parsing Results

### NCBIXML Parser

```python
from Bio.Blast import NCBIWWW, NCBIXML

# Run BLAST
result_handle = NCBIWWW.qblast('blastn', 'nt', sequence)

# Parse XML results
blast_record = NCBIXML.read(result_handle)
result_handle.close()

# Iterate hits
for alignment in blast_record.alignments:
    print(f"Hit: {alignment.title}")
    for hsp in alignment.hsps:
        print(f"  E-value: {hsp.expect}")
        print(f"  Score: {hsp.score}")
        print(f"  Identity: {hsp.identities}/{hsp.align_length}")
```

### Alignment/HSP Attributes

```python
# Alignment (hit) attributes
alignment.title          # Hit description
alignment.accession      # Accession number
alignment.length         # Subject sequence length
alignment.hsps           # List of HSPs

# HSP (High-scoring Segment Pair) attributes
hsp.score               # Raw score
hsp.bits                # Bit score
hsp.expect              # E-value
hsp.identities          # Number of identical positions
hsp.positives           # Number of positive-scoring positions
hsp.gaps                # Number of gaps
hsp.align_length        # Alignment length
hsp.query               # Aligned query sequence
hsp.match               # Match line (| for identity)
hsp.sbjct               # Aligned subject sequence
hsp.query_start         # Query start position
hsp.query_end           # Query end position
hsp.sbjct_start         # Subject start position
hsp.sbjct_end           # Subject end position
hsp.strand              # Strand (blastn)
hsp.frame               # Reading frame (blastx/tblastn)
```

## Code Patterns

### Basic BLASTN

```python
from Bio.Blast import NCBIWWW, NCBIXML

sequence = '''ATGAAAGCAATTTTCGTACTGAAAGGTTGGTGGCGCACTTCCTGA'''

print("Running BLASTN (this may take a minute)...")
result_handle = NCBIWWW.qblast('blastn', 'nt', sequence)

blast_record = NCBIXML.read(result_handle)
result_handle.close()

print(f"\nFound {len(blast_record.alignments)} hits")
for alignment in blast_record.alignments[:5]:
    hsp = alignment.hsps[0]
    print(f"\n{alignment.title[:70]}...")
    print(f"  E-value: {hsp.expect:.2e}")
    print(f"  Identity: {hsp.identities}/{hsp.align_length} ({100*hsp.identities/hsp.align_length:.1f}%)")
```

### BLASTP with Organism Filter

```python
from Bio.Blast import NCBIWWW, NCBIXML

protein_seq = '''MVLSPADKTNVKAAWGKVGAHAGEYGAEALERMFLSFPTTKTYFPHFDLSH'''

result_handle = NCBIWWW.qblast(
    'blastp',
    'nr',
    protein_seq,
    entrez_query='Mammalia[organism]',
    hitlist_size=20,
    expect=0.001
)

blast_record = NCBIXML.read(result_handle)
result_handle.close()

for alignment in blast_record.alignments[:10]:
    hsp = alignment.hsps[0]
    print(f"{alignment.accession}: E={hsp.expect:.2e} - {alignment.title[:50]}...")
```

### BLAST from FASTA File

```python
from Bio import SeqIO
from Bio.Blast import NCBIWWW, NCBIXML

record = SeqIO.read('query.fasta', 'fasta')

result_handle = NCBIWWW.qblast('blastn', 'nt', record.seq)
blast_record = NCBIXML.read(result_handle)
result_handle.close()

for alignment in blast_record.alignments[:5]:
    print(f"{alignment.accession}: {alignment.title[:60]}...")
```

### Save Results to File

```python
from Bio.Blast import NCBIWWW

result_handle = NCBIWWW.qblast('blastn', 'nt', sequence)

# Save XML for later parsing
with open('blast_results.xml', 'w') as out:

Del mismo repositorio

aav-vector-design-agentSkill

adaptyvSkill

Cloud laboratory platform for automated protein testing and validation. Use when designing proteins and needing experimental validation including binding assays, expression testing, thermostability measurements, enzyme activity assays, or protein sequence optimization. Also use for submitting experiments via API, tracking experiment status, downloading results, optimizing protein sequences for better expression using computational tools (NetSolP, SoluProt, SolubleMPNN, ESM), or managing protein design workflows with wet-lab validation.

adhd-daily-plannerSkill

Time-blind friendly planning, executive function support, and daily structure for ADHD brains. Specializes in realistic time estimation, dopamine-aware task design, and building systems that

aeonSkill

This skill should be used for time series machine learning tasks including classification, regression, clustering, forecasting, anomaly detection, segmentation, and similarity search. Use when working with temporal data, sequential patterns, or time-indexed observations requiring specialized algorithms beyond standard ML approaches. Particularly suited for univariate and multivariate time series analysis with scikit-learn compatible APIs.

agent-browserSkill

Browse the web for any task — research topics, read articles, interact with web apps, fill forms, take screenshots, extract data, and test web pages. Use whenever a browser would be useful, not just when the user explicitly asks.

agentd-drug-discoverySkill

ai-analyzerSkill

AI驱动的综合健康分析系统，整合多维度健康数据、识别异常模式、预测健康风险、提供个性化建议。支持智能问答和AI健康报告生成。

alphafold-databaseSkill

Access AlphaFold's 200M+ AI-predicted protein structures. Retrieve structures by UniProt ID, download PDB/mmCIF files, analyze confidence metrics (pLDDT, PAE), for drug discovery and structural biology.