Skill223 repo starsupdated yesterday

find-cohort-gap

The find-cohort-gap skill systematically identifies novel, publishable research questions by analyzing variables within a cohort database and matching them against literature gaps and investigator expertise. Use this skill when you have access to a structured patient cohort with defined variables, endpoints, and follow-up data, and need to discover unexplored research directions that align with available data strengths and a specific researcher's publication record.

View source Repository: medsci-skills

Install in Claude Code

Copy

git clone --depth 1 https://github.com/Aperivue/medsci-skills /tmp/find-cohort-gap && cp -r /tmp/find-cohort-gap/skills/find-cohort-gap ~/.claude/skills/find-cohort-gap

Then start a new Claude Code session; the skill loads automatically.

Definition

SKILL.md

# Find-Cohort-Gap Skill

You are assisting a medical researcher in systematically discovering novel, publishable
research topics from a cohort database. Your approach combines cohort variable profiling,
PI expertise matching, literature saturation scanning, and multi-pattern gap scoring to
produce ranked topic proposals with evidence of novelty.

This skill fills a gap that no existing tool addresses: **DB variables -> literature gap
-> research question**. Existing tools (PICO, FINER, SciSpace, Elicit) work from
literature to gaps. This skill works from the data outward.

## Communication Rules

- Communicate with the user in their preferred language.
- All literature citations, variable names, and medical terminology in English.
- Be direct about weak topics — kill early, save time.

## Key Directories

- **Output**: User-specified directory (default: current working directory)
- **References**: `${CLAUDE_SKILL_DIR}/references/` for templates and rubrics

---

## Phase 0: Cohort Intake

Collect cohort metadata. Use the template at `${CLAUDE_SKILL_DIR}/references/cohort_profile_template.md`.

Required information:
1. **Cohort name and setting** (institution, country, population type)
2. **Sample size** (N at baseline, N with follow-up)
3. **Time span** (enrollment period, follow-up duration, measurement intervals)
4. **Variable categories** (demographics, labs, imaging, questionnaires, medications, procedures)
5. **Endpoints available** (mortality, cancer incidence, cardiovascular events, hospitalization)
6. **Special strengths** (serial measurements, linkage to national registries, unique population)
7. **Known limitations** (healthy volunteer bias, attrition, missing data patterns)
8. **Existing publications** from this cohort (if known — to avoid duplication)

If the user provides a data dictionary file (Excel/CSV), read it to extract variable
categories and construct the variable cluster map automatically.

**Gate:** Present the cohort profile summary. Confirm before proceeding.

---

## Phase 1: PI/CA Profiling

Profile the intended PI or corresponding author to find topic-expertise alignment.

1. **Search PubMed** for the PI's recent publications (last 5 years).
   - Use `/search-lit` E-utilities: `bash "$EUTILS" search "AuthorLastName AuthorFirstInitial[Author]" 30`
   - Extract top keyword clusters from titles/abstracts.
2. **Identify specialty signals**:
   - Academic society positions (president, board member, editor)
   - Subspecialty focus areas
   - Preferred journal tiers
3. **Build a PI keyword map**: 5-10 keyword clusters ranked by publication frequency.

If no PI is specified, skip this phase and use variable clusters alone in Phase 2.

**Output:** PI profile card (name, affiliation, top keywords, society roles, preferred journals).

---

## Phase 2: Intersection Matrix

Cross cohort variable clusters with PI expertise to generate candidate topics.

### Method

Create a matrix: rows = DB variable clusters, columns = PI keyword clusters.
Score each cell 0-3:
- **3**: PI has published in this exact intersection (direct match)
- **2**: PI's subspecialty covers this area (strong relevance)
- **1**: Tangential connection (possible but needs framing)
- **0**: No connection

### Candidate Generation

1. Extract all cells scoring 2-3 as primary candidates.
2. For cells scoring 1, apply the **A-B substitution test**: "Has someone published
   [this analysis] with [a different exposure/outcome] in a similar cohort?" If yes,
   substituting the PI's specialty variable creates a viable candidate.
3. Generate 20-40 candidate topic statements in PICO format:
   - **P**: Population from the cohort
   - **E**: Exposure/predictor variable(s)
   - **C**: Comparison group
   - **O**: Outcome (preferably hard endpoint)

### Discipline Alignment Filter

Before advancing candidates to saturation scanning, apply a discipline filter:

- **Who is the intended first author?** Identify their department/specialty.
- **Does the primary exposure variable belong to that discipline?** The first
  author's specialty must align with the study's core variable. For example:
  - Radiology first author → imaging variable must be the primary exposure
  - Cardiology first author → cardiac biomarker or ECG finding as exposure
  - Neurology first author → neurological variable or brain imaging as exposure
- **Kill candidates where the primary exposure is outside the first author's
  discipline.** A strong PI match alone is insufficient if the first author
  cannot claim ownership of the core variable.

This filter prevents generating topics where the first author's contribution
is not defensible at the variable level.

**Gate:** Present the intersection matrix and top 20 candidates (post-discipline
filter). User selects 8-12 for saturation scanning.

---

## Phase 3: Literature Saturation Scan

For each selected candidate, determine how saturated the literature is.

### Search Strategy

For each candidate:
1. Build a PubMed query: `(exposure terms) AND (outcome terms) AND (cohort OR longitudinal OR prospective)`
2. Execute search via `/search-lit` E-utilities.
3. Count total results and classify:

| Grade | Count | Longitudinal? | Interpretation |
|-------|-------|---------------|----------------|
| **Blue Ocean** | 0-2 papers | N/A | First report possible. Verify the topic has audience interest. |
| **Green Field** | 3-10 papers, all cross-sectional | No longitudinal | **Optimal zone** — established interest, longitudinal gap wide open. |
| **Yellow** | 10-30 papers | Some longitudinal | Viable only with very specific angle (unique population, novel endpoint). |
| **Red** | 30+ papers or MA exists | Yes | Avoid unless doing NMA or using truly unique data. |

### Critical Filter

For each candidate in Green/Yellow, ask: **"Has anyone published this with serial/repeated
measurements?"** If no — automatic upgrade by one grade.

### "So What" Test

For each candidate, articulate 2-3 potential clinical implication

More from this repository

skillsSkill

academic-aioSkill

Medical AI paper optimization for AI search engines (Perplexity, ChatGPT web, Elicit, Consensus, SciSpace) and RAG-based literature tools. Applies when drafting or reviewing titles, abstracts, structured summary boxes (Key Points / Research in Context / Plain-Language Summary), manuscripts for high-impact medical AI journals (Lancet Digital Health, Radiology, Radiology-AI, npj Digital Medicine, Nature Medicine), preprints (medRxiv/arXiv), GitHub README + CITATION.cff + Zenodo archives, and Hugging Face model/dataset cards. Integrates TRIPOD+AI, CLAIM 2024, STARD-AI, TRIPOD-LLM, DECIDE-AI reporting requirements with generative engine optimization (GEO) principles. Produces a visible pass/fail checklist.

add-journalSkill

analyze-statsSkill

Statistical analysis for medical research papers. Generates reproducible Python/R code with publication-ready tables and figures. Supports diagnostic accuracy, inter-rater agreement, meta-analysis, survival analysis, survey data, group comparisons, regression, propensity score, and repeated measures.

author-strategySkill

PubMed author profile analysis. Author name → PubMed fetch → study-type classification → visualization → strategy report → optional trajectory-archetype classification.

batch-cohortSkill

Generate N analysis scripts from a single methodology template × multiple exposure/outcome combinations. The "80-person team" pattern — same validated method, swap variables only. Produces batch R/Python code + summary matrix.

calc-sample-sizeSkill

check-reportingSkill

Check manuscript compliance with medical research reporting guidelines. Supports 36 guidelines including STROBE, CONSORT, CONSORT-AI, STARD, STARD-AI, TRIPOD, TRIPOD+AI, TRIPOD-LLM, ARRIVE, PRISMA, PRISMA-DTA, PRISMA-P, CARE, SPIRIT, SPIRIT-AI, CLAIM, DECIDE-AI, MI-CLEAR-LLM, SQUIRE 2.0, CLEAR, MOOSE, GRRAS, SWiM, AMSTAR 2, and risk of bias tools (QUADAS-2, QUADAS-C, RoB 2, ROBINS-I, ROBINS-E, ROBIS, ROB-ME, PROBAST, PROBAST+AI, NOS, COSMIN, RoB NMA). Generates item-by-item assessment with PRESENT/MISSING/PARTIAL status.