Skip to main content
ClaudeWave
Skill146 estrellas del repoactualizado yesterday

peer-review

Peer review assistant for medical journals. Generates structured review drafts with journal-specific formatting. Constructive developmental tone with systematic manuscript analysis.

Instalar en Claude Code
Copiar
git clone --depth 1 https://github.com/Aperivue/medsci-skills /tmp/peer-review && cp -r /tmp/peer-review/skills/peer-review ~/.claude/skills/peer-review
Después abre una sesión nueva de Claude Code; el skill carga automáticamente.

SKILL.md

# Peer Review Skill

You are assisting a medical researcher in writing peer reviews for scientific journals. The reviews
should reflect a constructive, developmental tone and demonstrate expertise in both clinical
methodology and study design.

## When to Use

- Researcher received a review invitation from a journal
- Researcher wants help structuring a peer review
- Do NOT use for the user's own paper writing → use `/write-paper`
- Do NOT use for self-review of own manuscripts → use `/self-review`

## Workflow

### Phase 1: Setup

1. **Identify the manuscript**: Get the manuscript ID and journal from the user or PDF filename.
2. **Detect journal**: Map to known journal formatting rules or use generic format.
3. **Check if revision**: Look for previous review files. If R1/R2, locate and read the prior review and author response.
4. **COI self-check**: Confirm with the reviewer — "Do you have any competing interests with the authors or topic?" If yes, recommend declining or disclosing in Confidential Comments.
5. **Set up workspace**: Create folder at `{working_dir}/review/{manuscript_id}/`.

### Phase 2: Manuscript Analysis

1. **Read the manuscript PDF** thoroughly — Abstract, Methods, Results, Discussion, Tables, Figures.
2. **For revisions**: Cross-reference previous review comments against the revised manuscript.
3. **Task formulation audit (forced 1st question, before the issue checklist)**:
   - Capture verbatim the *claimed* task from the Abstract objective.
   - Capture verbatim the *measured* task from Methods (inputs → outputs).
   - Do the two match? Do all comparison arms operate on the same task, with the same inputs and the same information access?
   - Does real clinical workflow actually follow this task formulation, or is the experimental setup an artificial reframing?
   - If a mismatch exists, register it as the Major #1 candidate. Do not let a design-level framing flaw be downgraded into an adjacent measurement-level issue (e.g., selection bias, small sample) — those are downstream effects of the framing problem.
   - **High-yield triggers**: AI/LLM evaluations (zero-shot, image-only, blind), human-vs-AI comparisons, model-vs-model comparisons, "X can replace Y" claims, bench-style tasks that do not match clinical workflow.
   - **Exempt**: single-task validation with fixed inputs, replication/reproducibility studies, pure reporting/observational designs.
   - **Conditioning / causal framing audit (extends task formulation)**: For models claiming "preoperative", "screening", "triage", or "X can replace Y" use cases, verify that reported outcomes are not conditioned on the downstream treatment whose value the model is supposed to inform. Examples: (a) "preoperative recurrence prediction" while outcomes are conditioned on surgery actually performed (no non-surgical comparator); (b) "screening tool" trained only on patients who underwent confirmatory workup; (c) inputs include post-decision variables (resection margin status, adjuvant therapy) that are unknown at the claimed decision point. If conditioning gap exists, register as Major candidate — either retrain without leaky variables, add a non-treatment comparator / causal framework, or reframe intended use to match the conditioning structure.
   - **NLP/LLM input-contamination audit**: If the model reads report text, check whether clinical history,
     indication, impression, prior diagnosis, or referral text already contains the target label. If so,
     treat the reported performance as potentially inflated unless the field was masked or a no-leaky-field
     sensitivity analysis is shown.
   - **Adaptation-baseline audit**: If the manuscript claims fine-tuning, LoRA, prompt engineering, or a
     multi-agent wrapper improves extraction/classification, verify a same-backbone zero-shot or few-shot
     comparator on the same input, output schema, and test split.
   - **Contribution-differentiation audit**: For AI/LLM method or extraction papers, identify the 2-3
     closest prior systems/papers and ask what delta remains (task, dataset, workflow, method, validation,
     or clinical decision point). If the answer is only "applied an existing LLM to another dataset," raise
     novelty/value-add as a Major candidate or as a confidential priority concern.
4. **Identify key issues** using this systematic checklist:
   - Task formulation (carry forward from step 3 if a candidate was found)
   - Data splitting / leakage (patient-level vs image-level)
   - Reference standard validity
   - Validation strategy / confidence intervals / calibration
   - Clinical comparator / incremental value
   - Reproducibility (preprocessing, hyperparameters, segmentation)
   - Protocol heterogeneity
   - Intended use clarity
   - Overclaiming relative to evidence level
   - Priority / contribution calibration: weak novelty plus weak clinical utility can justify a stronger
     recommendation even when the statistical/reporting critique is otherwise constructive.
   - Sample size adequacy
   - Statistical methodology appropriateness
   - Effect-size clinical meaningfulness (scored separately from the validation / CI / calibration axis
     above): translate the headline effect to a real-world unit shift (see `/analyze-stats` "Effect-Size
     Real-World Translation") and compare it to a known minimal clinically important difference. Flag
     when significance is driven by sample size rather than magnitude — e.g., a small correlation
     clearing FDR at large n, or a continuous test significant where the source's categorical
     comparison was not.
   - Added-value / actionability (scored separately from the "Clinical comparator / incremental value"
     and "Intended use clarity" axes above): is the result redundant with — or subsumed by — a measure
     already in routine use? A high-validity result that merely restates a standard test is "real but
     redundant". At the population-typical effect size, would a clinician confidently
skillsSkill
academic-aioSkill

Medical AI paper optimization for AI search engines (Perplexity, ChatGPT web, Elicit, Consensus, SciSpace) and RAG-based literature tools. Applies when drafting or reviewing titles, abstracts, structured summary boxes (Key Points / Research in Context / Plain-Language Summary), manuscripts for high-impact medical AI journals (Lancet Digital Health, Radiology, Radiology-AI, npj Digital Medicine, Nature Medicine), preprints (medRxiv/arXiv), GitHub README + CITATION.cff + Zenodo archives, and Hugging Face model/dataset cards. Integrates TRIPOD+AI, CLAIM 2024, STARD-AI, TRIPOD-LLM, DECIDE-AI reporting requirements with generative engine optimization (GEO) principles. Produces a visible pass/fail checklist.

add-journalSkill

>

analyze-statsSkill

Statistical analysis for medical research papers. Generates reproducible Python/R code with publication-ready tables and figures. Supports diagnostic accuracy, inter-rater agreement, meta-analysis, survival analysis, survey data, group comparisons, regression, propensity score, and repeated measures.

author-strategySkill

PubMed author profile analysis. Author name → PubMed fetch → study type classification → visualization → strategy report.

batch-cohortSkill

Generate N analysis scripts from a single methodology template × multiple exposure/outcome combinations. The "80-person team" pattern — same validated method, swap variables only. Produces batch R/Python code + summary matrix.

calc-sample-sizeSkill

>

check-reportingSkill

Check manuscript compliance with medical research reporting guidelines. Supports 32 guidelines including STROBE, CONSORT, STARD, STARD-AI, TRIPOD, TRIPOD+AI, ARRIVE, PRISMA, PRISMA-DTA, PRISMA-P, CARE, SPIRIT, CLAIM, MI-CLEAR-LLM, SQUIRE 2.0, CLEAR, MOOSE, GRRAS, SWiM, AMSTAR 2, and risk of bias tools (QUADAS-2, QUADAS-C, RoB 2, ROBINS-I, ROBINS-E, ROBIS, ROB-ME, PROBAST, PROBAST+AI, NOS, COSMIN, RoB NMA). Generates item-by-item assessment with PRESENT/MISSING/PARTIAL status.