Skip to main content
ClaudeWave
Skill199 repo starsupdated 16d ago

sciagent-skill-creator

|

Install in Claude Code
Copy
git clone --depth 1 https://github.com/jaechang-hits/SciAgent-Skills /tmp/sciagent-skill-creator && cp -r /tmp/sciagent-skill-creator/.claude/skills/sciagent-skill-creator ~/.claude/skills/sciagent-skill-creator
Then start a new Claude Code session; the skill loads automatically.

SKILL.md

# SciAgent Skill Creator

Repo-local scaffolder for `skills/` entries. Mechanizes the boilerplate from `CLAUDE.md` Steps 1, 2, 4, 5, 6 so authoring effort stays on *content* (When to Use, Workflow, Recipes, References) and not on field plumbing.

## When to invoke this skill

- User asks for a new SciAgent skill entry on a specific tool, library, database, or guide topic
- User invokes `/sciagent-skill-creator` directly
- The agent is about to hand-edit `registry.yaml` and create a `skills/<cat>/<name>/SKILL.md` from scratch — use this instead

Do **not** invoke for:
- Editing an existing entry's content (just edit the file)
- Migrating an existing entry (read `CLAUDE.md` "Migrating from Existing Entries" first — the scaffolder generates a skeleton, but migration requires content judgment)
- Updating `registry.yaml` only (use a normal edit)

## What you need to collect from the user

Before calling the scaffold script, gather these — in conversation, not via flags hidden from the user:

1. **Topic** — concrete tool/library/concept name. Reject vague topics ("ML stuff") with a clarifying question.
2. **Sub-type** — `pipeline` | `toolkit` | `database` | `guide`. Use the decision rule from CLAUDE.md Step 1b. If unsure, ask the user.
3. **Category** — primary category directory. List the table from `CLAUDE.md` Step 2 if the user is unsure.
4. **Entry name** — kebab-case slug. Convention: `{tool-name}-{purpose}` (e.g., `pydeseq2-differential-expression`). Confirm with the user.
5. **License** — underlying tool's license. Default to `CC-BY-4.0` for original prose-only content.
6. **Description** — 1-2 sentences, max 1024 chars. Lead with tool/domain keyword in the first 120 chars. Anti-patterns are in CLAUDE.md Step 5 "Description writing rules".
7. **Tags** (optional) — only if the entry meaningfully spans multiple categories (e.g., literature DB stored under `scientific-writing`, tag with `["databases", "literature"]`).

## Duplicate check before scaffolding

Before calling the scaffold script, search the registry and `legacy/` for similar names:

```bash
grep -i "<topic-keyword>" registry.yaml
ls legacy/ | grep -i "<topic-keyword>"
```

If a near-duplicate exists, surface it to the user before continuing. Authoring a parallel entry usually means the existing one needs updating, not duplication.

## How to run the scaffolder

Call `scripts/scaffold.py` with explicit arguments. The script is **non-interactive** — the agent provides all values:

```bash
python .claude/skills/sciagent-skill-creator/scripts/scaffold.py \
  --sub-type pipeline \
  --category genomics-bioinformatics \
  --name my-tool-purpose \
  --description "MyTool short-form description starting with the tool name. Brief on inputs, outputs, when to pick this over alternatives." \
  --license MIT \
  --tags databases,literature   # optional, comma-separated
```

Behavior:

1. Validates name (kebab-case, not already in `registry.yaml`, not in `legacy/`)
2. Validates category exists as a directory under `skills/`
3. Validates description with `validate_description.py` (length + first-120-char keyword lead)
4. Validates tags (kebab-case if provided)
5. Creates `skills/{category}/{name}/SKILL.md` from the matching template, substituting frontmatter fields
6. Appends a new entry to `registry.yaml` with `date_added` = today (UTC)
7. Runs `pixi run validate` to confirm the registry is still well-formed
8. Prints next steps (fill in Overview, Workflow, Recipes, References)

On any validation failure, the script aborts without writing anything. Fix the offending value and re-run.

## After scaffolding

The generated SKILL.md is a **skeleton with placeholders**. The agent's remaining job:

1. Fill `Overview`, `When to Use`, `Prerequisites`, `Workflow` / `Core API` / `Key Concepts`, `Common Recipes`, `Troubleshooting`, `References`
2. Match the section structure required by the sub-type (see CLAUDE.md Step 4 format rules)
3. Run `pixi run test` — full suite, not just `validate` — to catch sub-type-specific structural failures (code block counts, table row counts, section presence)

The scaffold script does not pretend to write content. Content stays with the agent and the source material.

## Content authoring rules (what NOT to bake into a SKILL.md)

Skills document a tool's *analysis surface*, not the consumer's *house style*. A SKILL.md is read by many agents for many downstream tasks — visual choices that fit one analysis brief leak into every future invocation. Strip the following before committing:

- **Color palettes, cmaps, themes** — no hex codes (`#08306b`), no `LinearSegmentedColormap.from_list(...)`, no `ListedColormap([...])`, no prescribed `cmap=` arguments unless the cmap *is* the tool's API (e.g., a tool that ships its own palette). Let matplotlib pick defaults; the consumer overrides downstream.
- **Per-replicate / per-condition color dicts** — e.g., `colors = {"rep1": "#1f77b4", ...}`. Matplotlib auto-cycles colors.
- **Font choices, dpi presets, figure sizes tuned for one report** — `figsize=(8, 4)` for a routine line plot is fine; `figsize=(12, 4)` chosen to fit a slide deck is not.
- **One-shot user-brief specifics** — if the user asked for "blue for low, red for high" in *their* analysis, that belongs in their code, not the skill. The skill teaches *how to compute* phi/psi density; *how to color it* is consumer choice.
- **Hardcoded paths beyond the tool's defaults** — `"figures/"`, `"results/"`, `f"{pdb_id}_protein.pdb"` are fine as illustrative outputs; `"/Users/me/proj42/output"` is not.

What to keep: the analysis logic, the data shape, the units, the parameter semantics, the expected output *structure* (columns, axes, units), and any visual choice the tool itself enforces.

Rule of thumb: if a downstream consumer would *override* the choice, don't ship the choice in the skill.

## Writing style: be succinct

A SKILL.md is reference material for agents, not a tutorial. Token cost matters — every line is paid for
opentrons-integrationSkill

Opentrons Protocol API v2 for OT-2/Flex: Python protocols for pipetting, serial dilutions, PCR, plate replication; control thermocycler, heater-shaker, magnetic, temperature modules. Use pylabrobot for multi-vendor.

plotly-interactive-visualizationSkill

Interactive visualization with Plotly. 40+ chart types (scatter, line, heatmap, 3D, geographic) with hover, zoom, pan. Two APIs: Plotly Express (DataFrame) and Graph Objects (fine control). For static publication figures use matplotlib; for statistical grammar use seaborn.

seaborn-statistical-visualizationSkill

Statistical visualization on matplotlib + pandas. Distributions (histplot, kdeplot, violin, box), relational (scatter, line), categorical, regression, correlation heatmaps. Auto aggregation/CIs. Use plotly for interactive; matplotlib for low-level.

single-cell-annotationSkill

Best practices for single-cell RNA-seq cell type annotation including marker-based, reference-based, and automated classification approaches.

pymc-bayesian-modelingSkill

Bayesian modeling with PyMC 5: priors, likelihood, NUTS/ADVI sampling, diagnostics (R-hat, ESS), LOO/WAIC comparison, prediction. Hierarchical, logistic, GP variants; predictive checks.

scikit-survival-analysisSkill

Time-to-event modeling with scikit-survival: Cox PH (elastic net), Random Survival Forests, Boosting, SVMs for censored data. C-index, Brier, time-dependent AUC; Kaplan-Meier, Nelson-Aalen, competing risks. Pipeline/GridSearchCV compatible. Use statsmodels for frequentist, pymc for Bayesian, lifelines for parametric.

statistical-analysisSkill

>-

statsmodels-statistical-modelingSkill

Python statistical modeling: regression (OLS, WLS, GLM), discrete (Logit, Poisson, NegBin), time series (ARIMA, SARIMAX, VAR), with rigorous inference, diagnostics, and hypothesis tests. Use scikit-learn for ML; statistical-analysis for test choice.