seaborn-statistical-plots
Statistical visualization on matplotlib with native pandas support. Auto aggregation, CIs, grouping for distributions (histplot, kdeplot), categorical (boxplot, violinplot), relational (scatterplot, lineplot), regression (regplot, lmplot), matrix (heatmap, clustermap), grids (pairplot, FacetGrid). Use for quick statistical summaries; matplotlib for fine control; plotly for interactive HTML.
git clone --depth 1 https://github.com/jaechang-hits/SciAgent-Skills /tmp/seaborn-statistical-plots && cp -r /tmp/seaborn-statistical-plots/skills/data-visualization/seaborn-statistical-plots ~/.claude/skills/seaborn-statistical-plotsSKILL.md
# Seaborn — Statistical Plots
## Overview
Seaborn is a Python library for statistical data visualization built on top of matplotlib. It works directly with pandas DataFrames, automatically handles grouping by categorical variables, computes confidence intervals and kernel density estimates, and produces attractive publication-ready figures with minimal configuration. Seaborn separates axes-level functions (embeddable in custom layouts) from figure-level functions (with built-in faceting), enabling both quick exploratory analysis and structured multi-panel figures.
## When to Use
- Comparing gene expression, protein abundance, or measurement distributions across experimental conditions (treatment vs. control, cell lines, time points)
- Generating grouped box plots, violin plots, or strip plots to show both summary statistics and individual data points simultaneously
- Visualizing pairwise correlations in multi-gene or multi-feature datasets as annotated heatmaps
- Plotting regression fits with confidence bands between continuous variables (e.g., cell viability vs. drug concentration)
- Faceting a single plot type across multiple sample subsets, tissue types, or experimental batches in one call
- Rapid exploratory analysis of a new dataset using `pairplot` to survey all pairwise relationships at once
- Use `matplotlib` directly when you need pixel-level control over figure elements, complex mixed-type layouts, or non-statistical custom plots
- Use `plotly` when the output must be interactive (hover tooltips, zoom, pan) or embedded in a web application
## Prerequisites
- **Python packages**: `seaborn>=0.13`, `matplotlib`, `pandas`, `numpy`
- **Data requirements**: Pandas DataFrame in long-form (tidy) format; each observation is a row, each variable is a column
- **Environment**: Standard Python environment; no GPU or special hardware required
```bash
pip install "seaborn>=0.13" matplotlib pandas numpy scipy
```
## Quick Start
```python
import seaborn as sns
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
# Simulate gene expression across conditions
rng = np.random.default_rng(42)
df = pd.DataFrame({
"gene": ["BRCA1"] * 60 + ["TP53"] * 60,
"condition": ["control", "treated"] * 60,
"log2_expr": np.concatenate([
rng.normal(5.2, 0.8, 60),
rng.normal(6.1, 0.9, 60),
])
})
sns.set_theme(style="ticks", context="notebook")
sns.boxplot(data=df, x="gene", y="log2_expr", hue="condition", palette="Set2")
plt.ylabel("log2 Expression")
plt.title("Gene Expression by Condition")
plt.tight_layout()
plt.savefig("quickstart_boxplot.png", dpi=150)
print("Saved quickstart_boxplot.png")
```
## Core API
### 1. Distribution Plots
Visualize univariate distributions and compare them across groups. `histplot` bins data; `kdeplot` fits a smooth density estimate; `displot` is the figure-level wrapper that adds faceting.
```python
import seaborn as sns
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
rng = np.random.default_rng(0)
n = 200
df = pd.DataFrame({
"log2_tpm": np.concatenate([rng.normal(4.5, 1.1, n), rng.normal(6.0, 1.3, n)]),
"sample": ["tumor"] * n + ["normal"] * n,
})
fig, axes = plt.subplots(1, 3, figsize=(15, 4))
# Histogram with density normalization and stacked hue groups
sns.histplot(data=df, x="log2_tpm", hue="sample", stat="density",
multiple="stack", bins=30, ax=axes[0])
axes[0].set_title("Histogram (stacked)")
# KDE with fill — bandwidth controlled by bw_adjust
sns.kdeplot(data=df, x="log2_tpm", hue="sample", fill=True,
bw_adjust=0.8, alpha=0.4, ax=axes[1])
axes[1].set_title("KDE (filled)")
# ECDF — useful for comparing cumulative distributions
sns.ecdfplot(data=df, x="log2_tpm", hue="sample", ax=axes[2])
axes[2].set_title("ECDF")
plt.tight_layout()
plt.savefig("distributions.png", dpi=150)
print("Saved distributions.png")
```
```python
# Bivariate KDE: joint distribution of two continuous variables
rng = np.random.default_rng(1)
df2 = pd.DataFrame({
"log2_rna": rng.normal(5.5, 1.2, 300),
"log2_prot": rng.normal(4.8, 1.0, 300) + 0.6 * rng.normal(5.5, 1.2, 300),
})
sns.kdeplot(data=df2, x="log2_rna", y="log2_prot",
fill=True, levels=8, thresh=0.05, cmap="Blues")
plt.xlabel("log2 RNA (TPM)")
plt.ylabel("log2 Protein (iBAQ)")
plt.title("RNA–Protein Correlation Density")
plt.tight_layout()
plt.savefig("bivariate_kde.png", dpi=150)
print("Saved bivariate_kde.png")
```
### 2. Categorical Plots
Compare distributions or aggregated statistics across categorical groups. Axes-level functions (`boxplot`, `violinplot`, `stripplot`, `swarmplot`, `barplot`) accept an `ax=` parameter for embedding in custom layouts.
```python
import seaborn as sns
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
rng = np.random.default_rng(2)
conditions = ["DMSO", "Drug A 1uM", "Drug A 10uM", "Drug B 1uM", "Drug B 10uM"]
df = pd.DataFrame({
"condition": np.repeat(conditions, 30),
"viability": np.concatenate([
rng.normal(100, 5, 30),
rng.normal(92, 7, 30),
rng.normal(65, 10, 30),
rng.normal(88, 8, 30),
rng.normal(45, 12, 30),
])
})
fig, axes = plt.subplots(1, 3, figsize=(18, 5))
# Box plot — shows quartiles and outliers
sns.boxplot(data=df, x="condition", y="viability",
palette="husl", width=0.5, ax=axes[0])
axes[0].set_xticklabels(axes[0].get_xticklabels(), rotation=30, ha="right")
axes[0].set_title("Box Plot")
# Violin — KDE shape + inner quartile lines
sns.violinplot(data=df, x="condition", y="viability",
inner="quart", palette="muted", ax=axes[1])
axes[1].set_xticklabels(axes[1].get_xticklabels(), rotation=30, ha="right")
axes[1].set_title("Violin Plot")
# Strip plot overlaid on box — shows all individual points
sns.boxplot(data=df, x="condition", y="viability",
palette="pastel", width=0.5, ax=axes[2])
sns.stripplot(data=df, x="condition", y="viabilit|
Opentrons Protocol API v2 for OT-2/Flex: Python protocols for pipetting, serial dilutions, PCR, plate replication; control thermocycler, heater-shaker, magnetic, temperature modules. Use pylabrobot for multi-vendor.
Interactive visualization with Plotly. 40+ chart types (scatter, line, heatmap, 3D, geographic) with hover, zoom, pan. Two APIs: Plotly Express (DataFrame) and Graph Objects (fine control). For static publication figures use matplotlib; for statistical grammar use seaborn.
Statistical visualization on matplotlib + pandas. Distributions (histplot, kdeplot, violin, box), relational (scatter, line), categorical, regression, correlation heatmaps. Auto aggregation/CIs. Use plotly for interactive; matplotlib for low-level.
Best practices for single-cell RNA-seq cell type annotation including marker-based, reference-based, and automated classification approaches.
Bayesian modeling with PyMC 5: priors, likelihood, NUTS/ADVI sampling, diagnostics (R-hat, ESS), LOO/WAIC comparison, prediction. Hierarchical, logistic, GP variants; predictive checks.
Time-to-event modeling with scikit-survival: Cox PH (elastic net), Random Survival Forests, Boosting, SVMs for censored data. C-index, Brier, time-dependent AUC; Kaplan-Meier, Nelson-Aalen, competing risks. Pipeline/GridSearchCV compatible. Use statsmodels for frequentist, pymc for Bayesian, lifelines for parametric.
>-