cross-national
End-to-end cross-national comparison study using KNHANES + NHANES + CHNS (or other parallel surveys). Variable harmonization, parallel weighted analysis, and comparison tables. Supports 2-country (KR+US) and 3-country (KR+US+CN) designs.
git clone --depth 1 https://github.com/Aperivue/medsci-skills /tmp/cross-national && cp -r /tmp/cross-national/skills/cross-national ~/.claude/skills/cross-nationalSKILL.md
# Cross-National Comparison Study Skill
You are assisting a medical researcher in conducting a cross-national comparison study
using parallel nationally representative surveys (e.g., KNHANES for Korea, NHANES for the US, CHNS for China).
## When to Use
- Researcher has a clinical question to compare across two countries
- KNHANES + NHANES data available (or other parallel survey pairs)
- Goal: produce a complete analysis with country-stratified results + comparison table
## Inputs
1. **Research question**: exposure → outcome association to compare across countries
2. **Korean data path**: KNHANES CSV file
3. **US data path**: NHANES CSV directory (multiple tables to merge)
4. **Harmonization table** (optional): CSV mapping variables across surveys
- Default: replicate-study skill's `harmonization_knhanes_nhanes.csv`
## Reference Files
- Harmonization table: `medsci-skills/skills/replicate-study/references/harmonization_knhanes_nhanes.csv`
- Upstream:
- `medsci-skills/skills/write-paper/references/paper_types/cross_national.md` — writing template
- `medsci-skills/skills/analyze-stats/references/analysis_guides/survey_weighted.md`
## Workflow
### Phase 1: Study Definition
1. Confirm research question: Exposure → Outcome
2. Define variable coding for both countries:
- Exposure: PHQ-9, BMI category, smoking, etc.
- Outcome: diabetes, hypertension, mortality, etc.
- Covariates: age, sex, education, income, smoking, alcohol, obesity, CVD
3. Check harmonization table for variable availability
4. Output: study protocol summary for user approval
### Phase 2: Data Preparation
**KNHANES (single CSV)**:
1. Load CSV, filter age ≥20 (or per protocol)
2. Derive variables using KNHANES coding:
- Smoking: BS3_1 (1,2=current, 3=former, 8=never)
- Alcohol: BD1_11 (2-6=frequent, 1=occasional, 8=never)
- Obesity: HE_obe (≥4=obesity for BMI≥25 Asian cutoff)
- PHQ-9: BP_PHQ_1~9, sum score, ≥10=depression
- Diabetes: HE_glu≥126 | HE_HbA1c≥6.5 | DE1_dg=1
- CVD: DI4_dg=1 | DI5_dg=1 | DI6_dg=1
3. Set survey design: svydesign(id=~psu, strata=~kstrata, weights=~wt_itvex, nest=TRUE)
**NHANES (multiple CSVs)**:
1. Load and merge tables by SEQN (DEMO_J, DPQ_J, GHB_J, BIOPRO_J, BMX_J, SMQ_J, ALQ_J, DIQ_J, MCQ_J, BPQ_J)
2. Derive variables using NHANES coding:
- Smoking: SMQ020 + SMQ040 (100 cigs + now smoke)
- Alcohol: ALQ121 (past 12 mo frequency → categories)
- Obesity: BMXBMI ≥30 (WHO cutoff, NOT Asian)
- PHQ-9: DPQ010~DPQ090, sum score, ≥10=depression
- Diabetes: LBXSGL≥126 | LBXGH≥6.5 | DIQ010=="Yes" (CRITICAL: LBXSGL not LBXSGLU)
- CVD: MCQ160B=="Yes" (CHF) | MCQ160C=="Yes" (CHD) | MCQ160D=="Yes" (angina) | MCQ160E=="Yes" (MI)
- HTN: BPXOSY3≥140 | BPXODI3≥90 | BPQ020=="Yes"
3. Set survey design: svydesign(id=~SDMVPSU, strata=~SDMVSTRA, weights=~WTMECPRP, nest=TRUE)
### Phase 3: Parallel Analysis
For EACH country independently:
1. **Table 1**: Baseline characteristics by exposure (weighted counts + percentages)
2. **Main analysis**: Sequential logistic regression models
- Model 1 (unadjusted)
- Model 2 (age + sex)
- Model 3 (fully adjusted: + education, income, smoking, alcohol, obesity, CVD)
3. **Subgroup analyses**: By sex, age group, education, income, alcohol, smoking, CVD, obesity
4. **Dose-response** (if applicable): RCS with 3 knots
### Phase 4: Cross-National Comparison Table
Generate a side-by-side comparison:
| Analysis | Korea wOR (95% CI) | US wOR (95% CI) | Direction Agreement |
|----------|-------------------|-----------------|---------------------|
| Overall (fully adjusted) | ... | ... | ✓/✗ |
| Male | ... | ... | |
| Female | ... | ... | |
| ... | ... | ... | |
### Phase 5: Output Files
```
{working_dir}/
├── cross_national_report.md — Study summary + comparison tables
├── variable_mapping.csv — Variable mapping with match status
├── analysis_korea.R — KNHANES analysis (self-contained)
├── analysis_us.R — NHANES analysis (self-contained)
├── results/
│ ├── table1_korea.csv
│ ├── table1_us.csv
│ ├── main_results_comparison.csv
│ └── subgroup_comparison.csv
└── manuscript_draft/ — Optional: Methods + Results draft
├── methods_draft.md
└── results_draft.md
```
## Critical Rules
1. **NEVER pool data across countries**. Each country analyzed with its own survey design.
2. **Country-specific BMI cutoffs**: Korea ≥25 (Asian), US ≥30 (WHO).
3. **Country-specific income**: KNHANES quartile, NHANES PIR → harmonize to binary.
4. **Weighted analysis mandatory**: Both KNHANES and NHANES are complex surveys.
5. **Document all harmonization decisions**: What matches, what needed recoding, what differs.
6. **Same analytic approach**: Identical model specifications for both countries for fair comparison.
## KNHANES Variable Coding Reference (validated via Joo 2026 replication)
| Variable | Raw Var | Coding |
|----------|---------|--------|
| Smoking | BS3_1 | 1,2=Current; 3=Former; 8=Never |
| Alcohol | BD1_11 | 2-6=Frequent (current drinker); 1=Occasional (past-year abstainer); 8=Never |
| Obesity | HE_obe | 1-3=Normal; 4-6=Obesity (BMI≥25) |
| Depression | BP_PHQ_1~9 | Sum ≥10 = depression |
| Diabetes | HE_glu, HE_HbA1c, DE1_dg | FPG≥126 or HbA1c≥6.5 or DE1_dg=1 |
| CVD | DI4_dg, DI5_dg, DI6_dg | Any = 1 → CVD yes |
| Education | edu | 1-3=Non-college; 4=College |
| Income | incm | 1-3=Bottom 80%; 4=Top 20% |
| Survey design | kstrata, psu, wt_itvex | strata, cluster, weight |
## NHANES Variable Coding Reference (validated via Joo 2026 cross-national)
**CRITICAL**: NHANES data downloaded via R `nhanesA` package uses TEXT LABELS, not numeric codes.
| Variable | Raw Var | Text Labels → Numeric |
|----------|---------|----------------------|
| PHQ-9 items | DPQ010~DPQ090 | "Not at all"→0, "Several days"→1, "More than half the days"→2, "Nearly every day"→3 |
| Sex | RIAGENDR | "Male" / "Female" (NOT 1/2) |
| Smoking (100 cigs) | SMQ020 | "Yes" / "No" |
| Smoking (noMedical AI paper optimization for AI search engines (Perplexity, ChatGPT web, Elicit, Consensus, SciSpace) and RAG-based literature tools. Applies when drafting or reviewing titles, abstracts, structured summary boxes (Key Points / Research in Context / Plain-Language Summary), manuscripts for high-impact medical AI journals (Lancet Digital Health, Radiology, Radiology-AI, npj Digital Medicine, Nature Medicine), preprints (medRxiv/arXiv), GitHub README + CITATION.cff + Zenodo archives, and Hugging Face model/dataset cards. Integrates TRIPOD+AI, CLAIM 2024, STARD-AI, TRIPOD-LLM, DECIDE-AI reporting requirements with generative engine optimization (GEO) principles. Produces a visible pass/fail checklist.
>
Statistical analysis for medical research papers. Generates reproducible Python/R code with publication-ready tables and figures. Supports diagnostic accuracy, inter-rater agreement, meta-analysis, survival analysis, survey data, group comparisons, regression, propensity score, and repeated measures.
PubMed author profile analysis. Author name → PubMed fetch → study type classification → visualization → strategy report.
Generate N analysis scripts from a single methodology template × multiple exposure/outcome combinations. The "80-person team" pattern — same validated method, swap variables only. Produces batch R/Python code + summary matrix.
>
Check manuscript compliance with medical research reporting guidelines. Supports 32 guidelines including STROBE, CONSORT, STARD, STARD-AI, TRIPOD, TRIPOD+AI, ARRIVE, PRISMA, PRISMA-DTA, PRISMA-P, CARE, SPIRIT, CLAIM, MI-CLEAR-LLM, SQUIRE 2.0, CLEAR, MOOSE, GRRAS, SWiM, AMSTAR 2, and risk of bias tools (QUADAS-2, QUADAS-C, RoB 2, ROBINS-I, ROBINS-E, ROBIS, ROB-ME, PROBAST, PROBAST+AI, NOS, COSMIN, RoB NMA). Generates item-by-item assessment with PRESENT/MISSING/PARTIAL status.