Skill223 repo starsupdated yesterday

author-strategy

The author-strategy skill analyzes a researcher's PubMed publication portfolio to identify their research patterns and strategic approach. Use this when studying a specific author's career trajectory to understand their publication preferences, study type distribution, author position trends, and research focus areas, producing a dataset, visualizations, and detailed strategy report for comparison or replication planning.

View source Repository: medsci-skills

Install in Claude Code

Copy

git clone --depth 1 https://github.com/Aperivue/medsci-skills /tmp/author-strategy && cp -r /tmp/author-strategy/skills/author-strategy ~/.claude/skills/author-strategy

Then start a new Claude Code session; the skill loads automatically.

Definition

SKILL.md

# /author-strategy — PubMed Author Strategy Analysis

## Purpose

Analyze a researcher's PubMed publication portfolio to reverse-engineer their research strategy. Produces a CSV dataset, 7 visualizations, and a strategy report.

## Prerequisites

- Python 3.10+ with `biopython`, `pandas`, `matplotlib`, `seaborn`, and `pyyaml` (PyYAML is required by the archetype classifier and the rubric renderer)
- Scripts: `${CLAUDE_SKILL_DIR}/fetch_pubmed.py`, `${CLAUDE_SKILL_DIR}/analyze_patterns.py`, `${CLAUDE_SKILL_DIR}/pubmed_parse.py` (stdlib parser), `${CLAUDE_SKILL_DIR}/classify_archetypes.py`, `${CLAUDE_SKILL_DIR}/render_archetype_doc.py`
- Rubric: `${CLAUDE_SKILL_DIR}/references/trajectory_archetypes.yaml` (canonical) and `${CLAUDE_SKILL_DIR}/references/trajectory_archetypes.md` (generated)

## Workflow

### Step 1: Gather Input

Ask the user for:
1. **Author name** (PubMed format, e.g., "Kim DK" or "Lee KS")
2. **Last name** for position classification (auto-detected if ambiguous)
3. **Output directory** (default: `~/.local/cache/author-strategy/{AuthorName}/`)

### Step 2: Fetch PubMed Data

```bash
python "${CLAUDE_SKILL_DIR}/fetch_pubmed.py" "{Author Name}" \
  --last-name "{LastName}" \
  --output "{output_dir}/data/{name}_publications.csv" \
  --email "{user_email}"
```

Review the console summary (total count, study type distribution, author position).
If count is 0, suggest alternative name formats (e.g., "Yon DK" vs "Yon D" vs "Yon Dong Keon").

### Step 3: Generate Visualizations and Report

```bash
python "${CLAUDE_SKILL_DIR}/analyze_patterns.py" "{output_dir}/data/{name}_publications.csv" \
  --output-dir "{output_dir}/report/" \
  --author-name "{Author Name}"
```

This produces:
- 7 PNG charts (01-07)
- `analysis_report.md` with strategy breakdown

### Step 4: Interpret and Present

Read `analysis_report.md` and present to the user:

1. **Executive summary**: total publications, growth trajectory, high-tier rate
2. **Primary strategy**: what study type dominates and why
3. **Author position analysis**: first/last positional rate vs middle (positional heuristic only — not leadership or corresponding-author metadata, which are unavailable here)
4. **Topic clusters**: research focus areas
5. **ROI quadrant**: which strategies yield high-tier + leadership vs. volume only
6. **Replication opportunities**: which patterns are replicable with Claude Code + public databases

### Step 5: Optional — MA Gap Identification

If the user asks "what MA topics are feasible with this professor?":
- Cross-reference topic clusters with existing MA plans in memory
- Identify gaps where the professor has domain expertise but no MA published
- Output a prioritized list of MA proposals

## Optional: Trajectory-Archetype Classification

A second, opt-in capability that classifies the author's trajectory into abstract
career archetypes (A1–A6 + a composite) as an **explainable, multi-label,
confidence-scored heuristic — not an objective verdict**. The rubric is the canonical
`references/trajectory_archetypes.yaml`. This path is **gated**: a surname alone does not
resolve an author, so the corpus must pass an explicit disambiguation review before it
can be classified.

### Step 6: Disambiguation Gate (required before classification)

Pass disambiguators so the target author is uniquely attributed (a surname alone is never
sufficient):

```bash
python "${CLAUDE_SKILL_DIR}/fetch_pubmed.py" "{Author Name}" \
  --initials "{Initials}" --orcid "{ORCID}" \
  --affiliation "{Institution}" --year-from "{YYYY}" --year-to "{YYYY}" \
  --output "{output_dir}/data/{name}_publications.csv" --email "{user_email}"
```

This writes the CSV, a `candidates.json` of affiliation/year candidate clusters, and a
`corpus_manifest.json` with `review_status: pending`. **Present the candidate clusters to
the user for review.** The user decides include/exclude. Only after the user has reviewed
the clusters do you finalize and approve the corpus (the `--approve` flag is a human gate
— never set it without explicit user review/approval):

```bash
python "${CLAUDE_SKILL_DIR}/fetch_pubmed.py" "{Author Name}" \
  --initials "{Initials}" --affiliation "{Institution}" \
  --include-pmids "{included.txt}" --exclude-pmids "{excluded.txt}" --approve \
  --output "{output_dir}/data/{name}_publications.csv" --email "{user_email}"
```

The manifest is cryptographically bound to the CSV (`csv_sha256` + `pmid_set_hash`); the
classifier refuses to run on an unapproved or mismatched corpus.

### Step 7: Run the Classifier and Present

```bash
python "${CLAUDE_SKILL_DIR}/classify_archetypes.py" \
  "{output_dir}/data/{name}_publications.csv" \
  --manifest "{output_dir}/data/corpus_manifest.json" \
  --rubric "${CLAUDE_SKILL_DIR}/references/trajectory_archetypes.yaml" \
  --output-dir "{output_dir}/report/"
```

Read `archetype_report.md` and present it to the user, **stating up front that the labels
are explainable heuristics, not objective classifications**. For each surfaced archetype,
show the score, confidence band, and the author's own evidence PMIDs. Honor the `[VERIFY]`
markers (h-index/citation/venue-tier are unavailable) and the A5 participation flag. List
the `insufficient evidence` archetypes too.

To retune the rubric, edit only the YAML and regenerate the narrative doc:

```bash
python "${CLAUDE_SKILL_DIR}/render_archetype_doc.py"        # regenerate the .md
python "${CLAUDE_SKILL_DIR}/render_archetype_doc.py" --check # CI/test sync gate
```

## Study Type Classifier

The classifier is tuned for Korean epidemiology and public health researchers. Categories:

| Type | Detection Pattern |
|------|------------------|
| GBD | "global burden" or "gbd" in title/abstract |
| SR/MA | "systematic review" or "meta-analysis" |
| NHIS/Claims | "national health insurance", "nhis", "claims database", "nationwide cohort" |
| Cross-national | Country pairs or "cross-national"/"binational" |
| National survey | "knhanes", "nhanes", "kchs", "national su

More from this repository

skillsSkill

academic-aioSkill

Medical AI paper optimization for AI search engines (Perplexity, ChatGPT web, Elicit, Consensus, SciSpace) and RAG-based literature tools. Applies when drafting or reviewing titles, abstracts, structured summary boxes (Key Points / Research in Context / Plain-Language Summary), manuscripts for high-impact medical AI journals (Lancet Digital Health, Radiology, Radiology-AI, npj Digital Medicine, Nature Medicine), preprints (medRxiv/arXiv), GitHub README + CITATION.cff + Zenodo archives, and Hugging Face model/dataset cards. Integrates TRIPOD+AI, CLAIM 2024, STARD-AI, TRIPOD-LLM, DECIDE-AI reporting requirements with generative engine optimization (GEO) principles. Produces a visible pass/fail checklist.

add-journalSkill

analyze-statsSkill

Statistical analysis for medical research papers. Generates reproducible Python/R code with publication-ready tables and figures. Supports diagnostic accuracy, inter-rater agreement, meta-analysis, survival analysis, survey data, group comparisons, regression, propensity score, and repeated measures.

batch-cohortSkill

Generate N analysis scripts from a single methodology template × multiple exposure/outcome combinations. The "80-person team" pattern — same validated method, swap variables only. Produces batch R/Python code + summary matrix.

calc-sample-sizeSkill

check-reportingSkill

Check manuscript compliance with medical research reporting guidelines. Supports 36 guidelines including STROBE, CONSORT, CONSORT-AI, STARD, STARD-AI, TRIPOD, TRIPOD+AI, TRIPOD-LLM, ARRIVE, PRISMA, PRISMA-DTA, PRISMA-P, CARE, SPIRIT, SPIRIT-AI, CLAIM, DECIDE-AI, MI-CLEAR-LLM, SQUIRE 2.0, CLEAR, MOOSE, GRRAS, SWiM, AMSTAR 2, and risk of bias tools (QUADAS-2, QUADAS-C, RoB 2, ROBINS-I, ROBINS-E, ROBIS, ROB-ME, PROBAST, PROBAST+AI, NOS, COSMIN, RoB NMA). Generates item-by-item assessment with PRESENT/MISSING/PARTIAL status.

clean-dataSkill

Interactive data profiling and cleaning assistant for medical research. Three-stage workflow (profile, flag, code-generate) with user approval gates at each step. Handles missing values, outliers, duplicates, and type mismatches in CSV/Excel clinical data. Does NOT auto-clean — all decisions require researcher confirmation.