Skip to main content
ClaudeWave
Skill2.7k estrellas del repoactualizado 2mo ago

bio-basecalling

This skill provides command-line patterns and configuration guidance for converting raw Nanopore sequencing data (FAST5/POD5 formats) into nucleotide sequences using the Dorado basecaller. It covers model selection across speed-accuracy tradeoffs (fast, hac, sup), GPU acceleration options, modified base detection, output format choices (BAM or FASTQ), batch size tuning, and specialized modes like duplex calling and demultiplexing. Use this when processing raw Nanopore electrical signal data into sequence reads before downstream alignment or analysis steps.

Instalar en Claude Code
Copiar
git clone --depth 1 https://github.com/FreedomIntelligence/OpenClaw-Medical-Skills /tmp/bio-basecalling && cp -r /tmp/bio-basecalling/skills/bio-basecalling ~/.claude/skills/bio-basecalling
Después abre una sesión nueva de Claude Code; el skill carga automáticamente.

SKILL.md

## Version Compatibility

Reference examples tested with: samtools 1.19+

Before using code patterns, verify installed versions match. If versions differ:
- CLI: `<tool> --version` then `<tool> --help` to confirm flags

If code throws ImportError, AttributeError, or TypeError, introspect the installed
package and adapt the example to match the actual API rather than retrying.

# Nanopore Basecalling

**"Basecall my Nanopore data"** → Convert raw electrical signal (FAST5/POD5) into nucleotide sequences with quality scores, optionally detecting modified bases.
- CLI: `dorado basecaller sup pod5/ > calls.bam` (recommended), `dorado basecaller sup,5mCG_5hmCG pod5/` (with modifications)

Convert raw electrical signal from Nanopore sequencing into nucleotide sequences.

## Dorado (Recommended)

Dorado is ONT's current production basecaller, replacing Guppy. It offers better accuracy and speed.

### Basic Basecalling

```bash
dorado basecaller sup pod5_dir/ > calls.bam
```

### Choose Model

```bash
dorado basecaller fast pod5_dir/ > calls.bam
dorado basecaller hac pod5_dir/ > calls.bam
dorado basecaller sup pod5_dir/ > calls.bam
```

### Model Speed vs Accuracy

| Model | Speed | Accuracy | Use Case |
|-------|-------|----------|----------|
| fast | Fastest | Lower | Quick preview |
| hac | Medium | High | General use |
| sup | Slowest | Highest | Publication quality |

### Specific Model Version

```bash
dorado download --model dna_r10.4.1_e8.2_400bps_sup@v5.1.0
dorado basecaller dna_r10.4.1_e8.2_400bps_sup@v5.1.0 pod5_dir/ > calls.bam
```

### List Available Models

```bash
dorado download --list
```

### Output FASTQ Instead of BAM

```bash
dorado basecaller sup pod5_dir/ --emit-fastq > calls.fastq
```

### Modified Base Detection

```bash
dorado basecaller sup,5mCG_5hmCG pod5_dir/ > calls_mods.bam
dorado basecaller sup,5mCG pod5_dir/ > calls_5mc.bam
dorado basecaller sup,6mA pod5_dir/ > calls_6ma.bam
```

### GPU Selection

```bash
dorado basecaller sup pod5_dir/ --device cuda:0 > calls.bam
dorado basecaller sup pod5_dir/ --device cuda:0,1 > calls.bam
dorado basecaller sup pod5_dir/ --device cpu > calls.bam
```

### Batch Size for Memory

```bash
dorado basecaller sup pod5_dir/ --batchsize 64 > calls.bam
```

### Duplex Calling

```bash
dorado duplex sup pod5_dir/ > duplex.bam
```

### Demultiplexing During Basecalling

```bash
dorado basecaller sup pod5_dir/ --kit-name SQK-NBD114-24 > calls.bam
dorado demux calls.bam --output-dir demuxed/ --kit-name SQK-NBD114-24
```

### Trim Adapters

```bash
dorado basecaller sup pod5_dir/ --trim adapters > calls.bam
dorado basecaller sup pod5_dir/ --no-trim > calls_untrimmed.bam
```

### Resume Interrupted Run

```bash
dorado basecaller sup pod5_dir/ --resume-from calls.bam > calls_complete.bam
```

## Guppy (Deprecated - Legacy Only)

Guppy is deprecated and no longer receiving updates. Use Dorado for all new analyses. Guppy examples below are only for maintaining legacy pipelines.

### Basic Basecalling

```bash
guppy_basecaller \
    -i fast5_dir/ \
    -s output_dir/ \
    -c dna_r10.4.1_e8.2_400bps_sup.cfg \
    --device cuda:0
```

### CPU Mode

```bash
guppy_basecaller \
    -i fast5_dir/ \
    -s output_dir/ \
    -c dna_r10.4.1_e8.2_400bps_fast.cfg \
    --num_callers 8 \
    --cpu_threads_per_caller 4
```

### High Accuracy Model

```bash
guppy_basecaller \
    -i fast5_dir/ \
    -s output_dir/ \
    -c dna_r10.4.1_e8.2_400bps_hac.cfg \
    --device cuda:0
```

### Super Accuracy Model

```bash
guppy_basecaller \
    -i fast5_dir/ \
    -s output_dir/ \
    -c dna_r10.4.1_e8.2_400bps_sup.cfg \
    --device cuda:0
```

### List Available Configs

```bash
guppy_basecaller --print_workflows
ls /opt/ont/guppy/data/*.cfg
```

### Modified Base Calling

```bash
guppy_basecaller \
    -i fast5_dir/ \
    -s output_dir/ \
    -c dna_r10.4.1_e8.2_400bps_modbases_5mc_cg_sup.cfg \
    --device cuda:0
```

### Barcoding During Basecalling

```bash
guppy_basecaller \
    -i fast5_dir/ \
    -s output_dir/ \
    -c dna_r10.4.1_e8.2_400bps_sup.cfg \
    --device cuda:0 \
    --barcode_kits SQK-NBD114-24
```

### Output BAM

```bash
guppy_basecaller \
    -i fast5_dir/ \
    -s output_dir/ \
    -c dna_r10.4.1_e8.2_400bps_sup.cfg \
    --device cuda:0 \
    --bam_out \
    --index
```

## POD5 File Handling

POD5 is the new format replacing FAST5.

### Convert FAST5 to POD5

```bash
pod5 convert fast5 fast5_dir/*.fast5 --output pod5_dir/
```

### Merge POD5 Files

```bash
pod5 merge pod5_dir/*.pod5 --output merged.pod5
```

### Inspect POD5

```bash
pod5 inspect reads input.pod5
pod5 inspect summary input.pod5
```

### Subset POD5

```bash
pod5 subset input.pod5 --output subset.pod5 --read-id-file read_ids.txt
```

## Quality Filtering

### Filter with Chopper (After Basecalling)

```bash
gunzip -c calls.fastq.gz | chopper -q 10 -l 500 | gzip > filtered.fastq.gz
```

### Filter by Quality Score

```bash
gunzip -c calls.fastq.gz | \
    awk 'BEGIN{OFS="\n"} {h=$0; getline seq; getline plus; getline qual;
         split(h, a, " "); split(a[4], q, "=");
         if(q[2] >= 10) print h, seq, plus, qual}' | \
    gzip > q10_filtered.fastq.gz
```

### NanoFilt (Alternative)

```bash
gunzip -c calls.fastq.gz | NanoFilt -q 10 -l 500 | gzip > filtered.fastq.gz
```

## Basecalling QC

### NanoPlot

```bash
NanoPlot --fastq calls.fastq.gz -o qc_report/ --plots hex dot
NanoPlot --bam calls.bam -o qc_report/
```

### pycoQC (From Sequencing Summary)

```bash
pycoQC -f sequencing_summary.txt -o pycoqc_report.html
```

### Basic Stats

```bash
seqkit stats calls.fastq.gz

awk 'NR%4==2 {sum+=length($0); count++} END {print "Reads:", count, "Mean length:", sum/count}' calls.fastq
```

## Model Selection Guide

### R10.4.1 Chemistry (Current)

| Model | Use |
|-------|-----|
| dna_r10.4.1_e8.2_400bps_fast | Quick analysis |
| dna_r10.4.1_e8.2_400bps_hac | Routine work |
| dna_r10.4.1_e8.2_400bps_sup | High accuracy |

### R9.4.1 Chemistry (Legacy)

| Model | Use |
|-------
aav-vector-design-agentSkill
adaptyvSkill

Cloud laboratory platform for automated protein testing and validation. Use when designing proteins and needing experimental validation including binding assays, expression testing, thermostability measurements, enzyme activity assays, or protein sequence optimization. Also use for submitting experiments via API, tracking experiment status, downloading results, optimizing protein sequences for better expression using computational tools (NetSolP, SoluProt, SolubleMPNN, ESM), or managing protein design workflows with wet-lab validation.

adhd-daily-plannerSkill

Time-blind friendly planning, executive function support, and daily structure for ADHD brains. Specializes in realistic time estimation, dopamine-aware task design, and building systems that

aeonSkill

This skill should be used for time series machine learning tasks including classification, regression, clustering, forecasting, anomaly detection, segmentation, and similarity search. Use when working with temporal data, sequential patterns, or time-indexed observations requiring specialized algorithms beyond standard ML approaches. Particularly suited for univariate and multivariate time series analysis with scikit-learn compatible APIs.

agent-browserSkill

Browse the web for any task — research topics, read articles, interact with web apps, fill forms, take screenshots, extract data, and test web pages. Use whenever a browser would be useful, not just when the user explicitly asks.

agentd-drug-discoverySkill
ai-analyzerSkill

AI驱动的综合健康分析系统,整合多维度健康数据、识别异常模式、预测健康风险、提供个性化建议。支持智能问答和AI健康报告生成。

alphafold-databaseSkill

Access AlphaFold's 200M+ AI-predicted protein structures. Retrieve structures by UniProt ID, download PDB/mmCIF files, analyze confidence metrics (pLDDT, PAE), for drug discovery and structural biology.