Skill31.9k repo starsupdated today

pacsomatic

Pacsomatic is an operator toolkit for executing nf-core/pacsomatic matched tumor-normal somatic variant calling workflows from BAM file inputs. Use it when validating run prerequisites, generating pacsomatic-compliant samplesheets, preparing reproducible Nextflow launch artifacts, executing locally or submitting to job schedulers (LSF, Slurm, PBS, SGE), and diagnosing pipeline startup or scheduler submission failures.

View source Repository: scientific-agent-skills

Install in Claude Code

Copy

git clone --depth 1 https://github.com/K-Dense-AI/scientific-agent-skills /tmp/pacsomatic && cp -r /tmp/pacsomatic/skills/pacsomatic ~/.claude/skills/pacsomatic

Then start a new Claude Code session; the skill loads automatically.

Definition

SKILL.md

# pacsomatic

## Overview

This skill provides a reproducible execution workflow for nf-core/pacsomatic, centered on a single helper entrypoint that handles validation, artifact generation, and optional execution.

Primary entrypoint:
- `scripts/run_pacsomatic.py`

The helper script:
- validates required identifiers, files, reference mode, and runtime prerequisites
- writes a pacsomatic-compatible samplesheet (`patient,sample,status,bam,pbi`)
- generates a params YAML and launch script for reproducible reruns
- supports dry-run validation and run/submit execution paths

Use this skill as the default path for pacsomatic operations. Do not bypass it with manually assembled `nextflow run nf-core/pacsomatic` commands unless the user explicitly asks for manual command construction.

## When to Use This Skill

Invoke this skill when the user asks to:
- run matched tumor-normal analysis from BAM files
- generate or fix pacsomatic samplesheet and launch artifacts
- execute locally or submit to schedulers (LSF/Slurm/PBS/SGE)
- perform dry-run validation before execution
- troubleshoot launch failures or summarize run outputs

Do not use this skill for:
- deep biological interpretation beyond run-level sanity checks
- editing pipeline internals unless explicitly requested

Typical trigger phrases:
- "run nf-core/pacsomatic for this tumor-normal pair"
- "prepare pacsomatic samplesheet and launch script"
- "do a dry run first and tell me what is missing"
- "submit pacsomatic to slurm/lsf and return the job id"
- "why did pacsomatic submission fail"

## Routing and Execution Rules

1. Always collect required run inputs first.
2. Always route through `scripts/run_pacsomatic.py` for validation and artifact generation.
3. Default to `--dry-run` when the user asks for checks/validation only.
4. Use `--run` only when the user asks to execute/submit.
5. For scheduler modes, include executor-specific resource arguments and return detected job ID when available.
6. If execution fails, report first failure point and next triage target (`.nextflow.log`, `pipeline_info`, failing task logs).

## Inputs Required

Required:
- tumor BAM path
- normal BAM path
- patient ID
- tumor sample ID
- normal sample ID
- output directory
- exactly one reference mode: `--fasta` or `--genome`

Optional:
- profile, resources, scheduler account/queue
- pipeline version (`-r`)
- params file, resume/report/dag flags
- `--dry-run` and/or `--run`

## Workflow

1. Validate identity and input constraints.
2. Validate required local paths (BAM, optional PBI, optional FASTA).
3. Resolve runtime and dependency checks.
4. Build samplesheet and generated params YAML.
5. Generate launch script for selected executor.
6. If `--dry-run` and not `--run`, stop after artifact generation.
7. If `--run`, execute locally or submit to scheduler.
8. Return command/script path, validation status, and job ID (if detected).

## Agent Response Contract

Every response after invocation should include:
- exact command used or generated script path
- confirmation that validation checks ran
- run type (`dry-run` vs `run`)
- scheduler job ID when available
- one concrete next step for validation/triage

## Quick Start

Dry run:

```bash
python scripts/run_pacsomatic.py \
--tumor-bam /path/to/tumor.bam \
--normal-bam /path/to/normal.bam \
--patient-id P001 \
--tumor-sample-id P001_T \
--normal-sample-id P001_N \
--outdir /path/to/output \
--genome GRCh38 \
--profile singularity,sanger \
--dry-run
```

Scheduler execution example (Slurm):

## Configuration

Use `config.yaml` as the baseline for profile/executor/runtime defaults. Override at invocation time when user requirements differ.

## Testing

Run unit tests from skill root:

```bash
python -m unittest discover -s tests -v
```

## References

- `references/agent-playbook.md`
- `references/config-and-output.md`
- `references/pacsomatic_guide.md`
- `scripts/run_pacsomatic.py`

More from this repository

adaptyvSkill

How to use the Adaptyv Bio Foundry API and Python SDK for protein experiment design, submission, and results retrieval. Use this skill whenever the user mentions Adaptyv, Foundry API, protein binding assays, protein screening experiments, BLI/SPR assays, thermostability assays, or wants to submit protein sequences for experimental characterization. Also trigger when code imports `adaptyv`, `adaptyv_sdk`, or `FoundryClient`, or references `foundry-api-public.adaptyvbio.com`.

aeonSkill

This skill should be used for time series machine learning tasks including classification, regression, clustering, forecasting, anomaly detection, segmentation, and similarity search. Use when working with temporal data, sequential patterns, or time-indexed observations requiring specialized algorithms beyond standard ML approaches. Particularly suited for univariate and multivariate time series analysis with scikit-learn compatible APIs.

anndataSkill

Data structure for annotated matrices in single-cell analysis. Use when working with .h5ad files or integrating with the scverse ecosystem. This is the data format skill—for analysis workflows use scanpy; for probabilistic models use scvi-tools; for population-scale queries use cellxgene-census.

arboretoSkill

Infer gene regulatory networks (GRNs) from gene expression data using scalable algorithms (GRNBoost2, GENIE3). Use when analyzing transcriptomics data (bulk RNA-seq, single-cell RNA-seq) to identify transcription factor-target gene relationships and regulatory interactions. Supports distributed computation for large-scale datasets.

astropySkill

Core Python library for astronomy and astrophysics workflows that need Astropy APIs, including units/quantities, coordinates, FITS I/O, tables, time systems, WCS, and cosmology. Use when implementing or debugging astronomical data analysis code with Astropy.

autoskillSkill

Observe the user's screen via screenpipe, detect repeated research workflows, match them against existing scientific-agent-skills, and draft new skills (or composition recipes that chain existing ones) for the patterns not yet covered. Use when the user asks to analyze their recent work and propose skills based on what they actually do. Requires the screenpipe daemon (https://github.com/screenpipe/screenpipe) running locally on port 3030 — the skill has no other data source and will refuse to run if screenpipe is unreachable. All detection runs locally; only redacted cluster summaries reach the LLM.

benchling-integrationSkill

Benchling Python SDK and REST API integration for registry entities, inventory, ELN entries, workflows, Benchling Apps, and Data Warehouse queries. Use when automating lab data with benchling-sdk or the v2 API.

bgpt-paper-searchSkill

Search scientific papers and retrieve structured experimental data extracted from full-text studies via the BGPT MCP server. Returns 25+ fields per paper including methods, results, sample sizes, quality scores, and conclusions. Use for literature reviews, evidence synthesis, and finding experimental details not available in abstracts alone.