scientific-visualization
Guide for choosing and creating scientific visualizations for publications and talks. Covers chart-type selection by data structure, color theory for accessibility/print, figure composition, journal formatting (Nature, Cell, ACS), and common pitfalls. Consult when visualizing data or preparing submission figures.
git clone --depth 1 https://github.com/jaechang-hits/SciAgent-Skills /tmp/scientific-visualization && cp -r /tmp/scientific-visualization/skills/data-visualization/scientific-visualization ~/.claude/skills/scientific-visualizationSKILL.md
# scientific-visualization ## Overview Effective scientific visualization communicates data clearly, honestly, and accessibly. Poor chart choices, misleading axes, or inaccessible color palettes can obscure findings or introduce bias. This guide covers the full workflow of scientific figure preparation: from selecting the right chart type for your data structure through color theory, accessibility, and journal submission formatting requirements. ## Key Concepts ### Chart Type and Data Type Alignment Every chart type is optimized for a specific data structure. Mismatches (e.g., pie charts for continuous distributions, bar charts for time series) hide structure and distort perception. | Data Type | Recommended Chart | Avoid | |-----------|------------------|-------| | Continuous distribution (1 group) | Histogram, violin plot, ridge plot | Bar chart with mean only | | Continuous distribution (2–5 groups) | Violin + boxplot overlay, beeswarm | Grouped bar chart | | Two continuous variables, correlation | Scatter plot, hexbin (large N) | Line chart without temporal order | | Categorical counts / proportions | Bar chart (horizontal for long labels) | Pie chart (>4 categories) | | Change over time (continuous) | Line chart | Bar chart | | Change over time (sparse events) | Step chart, event raster | Connected scatter | | Part-to-whole (≤5 parts) | Stacked bar, waffle chart | 3D pie chart | | High-dimensional (>5 variables) | Heatmap (clustered), parallel coordinates | 3D scatter | | Spatial data | Map, spatial heatmap | Bubble chart | | Survival / time-to-event | Kaplan-Meier curve | Bar chart of median survival | ### Color Theory for Science Color encodes information. Misused color introduces artifacts and fails readers with color vision deficiency (CVD; ~8% of males). **Sequential palettes** encode ordered numeric data from low to high (e.g., expression level, concentration). Use perceptually uniform palettes: `viridis`, `magma`, `cividis`. These also print in grayscale. **Diverging palettes** encode data with a meaningful midpoint (e.g., fold-change centered at 0, correlation from -1 to +1). Use `RdBu`, `coolwarm`, or `vlag`. Always ensure the midpoint maps to white/neutral. **Qualitative palettes** encode unordered categories. Use Okabe-Ito (CVD-safe), `tab10` (matplotlib default), or ColorBrewer qualitative palettes. Limit to ≤8 distinguishable colors; use shape or pattern as redundant encoding beyond that. **Color don'ts**: - Rainbow/jet colormap: not perceptually uniform; creates false contours - Red vs. green encoding: fails deuteranopia (~6% males) - Saturated color for background or large areas ### Figure Composition and Layout Scientific figures are typically multi-panel. Panel layout and labeling affect how readers parse information. - **Panel labels**: Bold uppercase letters (A, B, C) in the top-left corner; use 8–12 pt in the figure, larger in the caption reference. - **Alignment**: Align panel edges on a grid. Unaligned panels signal lack of attention to detail. - **White space**: Leave adequate margins; crowded panels reduce readability. - **Figure size**: Design for the target column width — single column (~85 mm / 3.35 in), 1.5 column (~114 mm / 4.5 in), or double column (~170 mm / 6.7 in) for Nature-family journals. - **Font**: Sans-serif (Arial, Helvetica) at 6–8 pt minimum in the final figure at publication resolution. ### Journal Formatting Requirements Major journals specify exact figure requirements for submission. Violating these causes desk-rejection delays. | Journal/Style | Max Width | Resolution | Color Mode | Font | File Format | |---------------|-----------|------------|------------|------|-------------| | Nature family | 89 mm (1-col), 183 mm (2-col) | 300 dpi (photos), 600 dpi (line art) | RGB or CMYK | Arial 5–7 pt | PDF, TIFF, EPS | | Cell/iScience | 85 mm (1-col), 170 mm (2-col) | 300 dpi raster, 600 dpi halftone | RGB | Helvetica 6–8 pt | PDF, EPS, TIFF | | ACS journals | 3.25 in (1-col), 7 in (2-col) | 600 dpi (color), 1200 dpi (b&w line art) | RGB (screen), CMYK (print) | Arial/Helvetica 4.5–7 pt | TIFF, EPS, PDF | | PLOS ONE | No strict width | 300 dpi (raster), 600–1200 dpi (line art) | RGB | Any | TIFF, EPS, PDF | ## Decision Framework Use this tree to select the right visualization for your analysis goal: ``` What is the primary message of this figure? | +-- Show a distribution or spread of values | +-- One group --> Histogram or violin plot | +-- 2-5 groups --> Violin + jitter (show all points if N < 100) | +-- Many groups --> Ridge plot (joy plot) | +-- Compare quantities between categories | +-- Few categories (2-5) --> Bar chart with error bars + individual points | +-- Many categories (>8) --> Lollipop chart or dot plot (horizontal) | +-- Paired measurements --> Slopegraph or paired dot plot | +-- Show a relationship between two continuous variables | +-- N < 1000 --> Scatter plot | +-- N > 1000 --> Hexbin or 2D density plot | +-- Time ordered --> Line chart | +-- Show composition or part-to-whole | +-- 2-4 parts --> Stacked bar or waffle chart | +-- Over time --> Stacked area chart | +-- Avoid pie chart unless <= 3 parts and proportions are obvious | +-- Show high-dimensional data | +-- Genes x samples --> Clustered heatmap (seaborn.clustermap) | +-- Embeddings (UMAP, PCA) --> Scatter colored by metadata | +-- Feature importance --> Horizontal bar chart (sorted) | +-- Show spatial or geographic data | +-- Microscopy --> Image overlay with colorbar | +-- Geographic --> Choropleth map ``` | Analysis Goal | Chart Type | Library | Key Consideration | |---------------|-----------|---------|-------------------| | Gene expression across groups | Violin + jitter | `seaborn`, `plotnine` | Show all points if N < 50; never bar+SEM only | | Differential expression
|
Opentrons Protocol API v2 for OT-2/Flex: Python protocols for pipetting, serial dilutions, PCR, plate replication; control thermocycler, heater-shaker, magnetic, temperature modules. Use pylabrobot for multi-vendor.
Interactive visualization with Plotly. 40+ chart types (scatter, line, heatmap, 3D, geographic) with hover, zoom, pan. Two APIs: Plotly Express (DataFrame) and Graph Objects (fine control). For static publication figures use matplotlib; for statistical grammar use seaborn.
Statistical visualization on matplotlib + pandas. Distributions (histplot, kdeplot, violin, box), relational (scatter, line), categorical, regression, correlation heatmaps. Auto aggregation/CIs. Use plotly for interactive; matplotlib for low-level.
Best practices for single-cell RNA-seq cell type annotation including marker-based, reference-based, and automated classification approaches.
Bayesian modeling with PyMC 5: priors, likelihood, NUTS/ADVI sampling, diagnostics (R-hat, ESS), LOO/WAIC comparison, prediction. Hierarchical, logistic, GP variants; predictive checks.
Time-to-event modeling with scikit-survival: Cox PH (elastic net), Random Survival Forests, Boosting, SVMs for censored data. C-index, Brier, time-dependent AUC; Kaplan-Meier, Nelson-Aalen, competing risks. Pipeline/GridSearchCV compatible. Use statsmodels for frequentist, pymc for Bayesian, lifelines for parametric.
>-