Skill284 estrellas del repoactualizado 4d ago

seaborn-statistical-plots

Seaborn provides statistical visualization functions built on matplotlib with native pandas DataFrame support, automatically handling grouping, confidence intervals, and aggregation across distribution, categorical, relational, regression, and matrix plot types. Use it for rapid exploratory analysis and publication-ready statistical summaries when comparing groups, visualizing correlations, or faceting plots across sample subsets; choose matplotlib for fine pixel-level control or plotly for interactive HTML output.

Ver fuente Repositorio: SciAgent-Skills

Instalar en Claude Code

Copiar

git clone --depth 1 https://github.com/jaechang-hits/SciAgent-Skills /tmp/seaborn-statistical-plots && cp -r /tmp/seaborn-statistical-plots/skills/data-visualization/seaborn-statistical-plots ~/.claude/skills/seaborn-statistical-plots

Después abre una sesión nueva de Claude Code; el skill carga automáticamente.

Definición

SKILL.md

# Seaborn — Statistical Plots

## Overview

Seaborn is a Python library for statistical data visualization built on top of matplotlib. It works directly with pandas DataFrames, automatically handles grouping by categorical variables, computes confidence intervals and kernel density estimates, and produces attractive publication-ready figures with minimal configuration. Seaborn separates axes-level functions (embeddable in custom layouts) from figure-level functions (with built-in faceting), enabling both quick exploratory analysis and structured multi-panel figures.

## When to Use

- Comparing gene expression, protein abundance, or measurement distributions across experimental conditions (treatment vs. control, cell lines, time points)
- Generating grouped box plots, violin plots, or strip plots to show both summary statistics and individual data points simultaneously
- Visualizing pairwise correlations in multi-gene or multi-feature datasets as annotated heatmaps
- Plotting regression fits with confidence bands between continuous variables (e.g., cell viability vs. drug concentration)
- Faceting a single plot type across multiple sample subsets, tissue types, or experimental batches in one call
- Rapid exploratory analysis of a new dataset using `pairplot` to survey all pairwise relationships at once
- Use `matplotlib` directly when you need pixel-level control over figure elements, complex mixed-type layouts, or non-statistical custom plots
- Use `plotly` when the output must be interactive (hover tooltips, zoom, pan) or embedded in a web application

## Prerequisites

- **Python packages**: `seaborn>=0.13`, `matplotlib`, `pandas`, `numpy`
- **Data requirements**: Pandas DataFrame in long-form (tidy) format; each observation is a row, each variable is a column
- **Environment**: Standard Python environment; no GPU or special hardware required

```bash
pip install "seaborn>=0.13" matplotlib pandas numpy scipy
```

## Quick Start

```python
import seaborn as sns
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np

# Simulate gene expression across conditions
rng = np.random.default_rng(42)
df = pd.DataFrame({
    "gene":      ["BRCA1"] * 60 + ["TP53"] * 60,
    "condition": ["control", "treated"] * 60,
    "log2_expr": np.concatenate([
        rng.normal(5.2, 0.8, 60),
        rng.normal(6.1, 0.9, 60),
    ])
})

sns.set_theme(style="ticks", context="notebook")
sns.boxplot(data=df, x="gene", y="log2_expr", hue="condition", palette="Set2")
plt.ylabel("log2 Expression")
plt.title("Gene Expression by Condition")
plt.tight_layout()
plt.savefig("quickstart_boxplot.png", dpi=150)
print("Saved quickstart_boxplot.png")
```

## Core API

### 1. Distribution Plots

Visualize univariate distributions and compare them across groups. `histplot` bins data; `kdeplot` fits a smooth density estimate; `displot` is the figure-level wrapper that adds faceting.

```python
import seaborn as sns
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd

rng = np.random.default_rng(0)
n = 200
df = pd.DataFrame({
    "log2_tpm":  np.concatenate([rng.normal(4.5, 1.1, n), rng.normal(6.0, 1.3, n)]),
    "sample":    ["tumor"] * n + ["normal"] * n,
})

fig, axes = plt.subplots(1, 3, figsize=(15, 4))

# Histogram with density normalization and stacked hue groups
sns.histplot(data=df, x="log2_tpm", hue="sample", stat="density",
             multiple="stack", bins=30, ax=axes[0])
axes[0].set_title("Histogram (stacked)")

# KDE with fill — bandwidth controlled by bw_adjust
sns.kdeplot(data=df, x="log2_tpm", hue="sample", fill=True,
            bw_adjust=0.8, alpha=0.4, ax=axes[1])
axes[1].set_title("KDE (filled)")

# ECDF — useful for comparing cumulative distributions
sns.ecdfplot(data=df, x="log2_tpm", hue="sample", ax=axes[2])
axes[2].set_title("ECDF")

plt.tight_layout()
plt.savefig("distributions.png", dpi=150)
print("Saved distributions.png")
```

```python
# Bivariate KDE: joint distribution of two continuous variables
rng = np.random.default_rng(1)
df2 = pd.DataFrame({
    "log2_rna": rng.normal(5.5, 1.2, 300),
    "log2_prot": rng.normal(4.8, 1.0, 300) + 0.6 * rng.normal(5.5, 1.2, 300),
})
sns.kdeplot(data=df2, x="log2_rna", y="log2_prot",
            fill=True, levels=8, thresh=0.05, cmap="Blues")
plt.xlabel("log2 RNA (TPM)")
plt.ylabel("log2 Protein (iBAQ)")
plt.title("RNA–Protein Correlation Density")
plt.tight_layout()
plt.savefig("bivariate_kde.png", dpi=150)
print("Saved bivariate_kde.png")
```

### 2. Categorical Plots

Compare distributions or aggregated statistics across categorical groups. Axes-level functions (`boxplot`, `violinplot`, `stripplot`, `swarmplot`, `barplot`) accept an `ax=` parameter for embedding in custom layouts.

```python
import seaborn as sns
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd

rng = np.random.default_rng(2)
conditions = ["DMSO", "Drug A 1uM", "Drug A 10uM", "Drug B 1uM", "Drug B 10uM"]
df = pd.DataFrame({
    "condition": np.repeat(conditions, 30),
    "viability": np.concatenate([
        rng.normal(100, 5, 30),
        rng.normal(92, 7, 30),
        rng.normal(65, 10, 30),
        rng.normal(88, 8, 30),
        rng.normal(45, 12, 30),
    ])
})

fig, axes = plt.subplots(1, 3, figsize=(18, 5))

# Box plot — shows quartiles and outliers
sns.boxplot(data=df, x="condition", y="viability",
            palette="husl", width=0.5, ax=axes[0])
axes[0].set_xticklabels(axes[0].get_xticklabels(), rotation=30, ha="right")
axes[0].set_title("Box Plot")

# Violin — KDE shape + inner quartile lines
sns.violinplot(data=df, x="condition", y="viability",
               inner="quart", palette="muted", ax=axes[1])
axes[1].set_xticklabels(axes[1].get_xticklabels(), rotation=30, ha="right")
axes[1].set_title("Violin Plot")

# Strip plot overlaid on box — shows all individual points
sns.boxplot(data=df, x="condition", y="viability",
            palette="pastel", width=0.5, ax=axes[2])
sns.stripplot(data=df, x="condition", y="viabilit

Del mismo repositorio

sciagent-skill-creatorSkill

opentrons-integrationSkill

Opentrons Protocol API v2 for OT-2/Flex: Python protocols for pipetting, serial dilutions, PCR, plate replication; control thermocycler, heater-shaker, magnetic, temperature modules. Use pylabrobot for multi-vendor.

plotly-interactive-visualizationSkill

Interactive visualization with Plotly. 40+ chart types (scatter, line, heatmap, 3D, geographic) with hover, zoom, pan. Two APIs: Plotly Express (DataFrame) and Graph Objects (fine control). For static publication figures use matplotlib; for statistical grammar use seaborn.

seaborn-statistical-visualizationSkill

Statistical visualization on matplotlib + pandas. Distributions (histplot, kdeplot, violin, box), relational (scatter, line), categorical, regression, correlation heatmaps. Auto aggregation/CIs. Use plotly for interactive; matplotlib for low-level.

single-cell-annotationSkill

Best practices for single-cell RNA-seq cell type annotation including marker-based, reference-based, and automated classification approaches.

pymc-bayesian-modelingSkill

Bayesian modeling with PyMC 5: priors, likelihood, NUTS/ADVI sampling, diagnostics (R-hat, ESS), LOO/WAIC comparison, prediction. Hierarchical, logistic, GP variants; predictive checks.

scikit-survival-analysisSkill

Time-to-event modeling with scikit-survival: Cox PH (elastic net), Random Survival Forests, Boosting, SVMs for censored data. C-index, Brier, time-dependent AUC; Kaplan-Meier, Nelson-Aalen, competing risks. Pipeline/GridSearchCV compatible. Use statsmodels for frequentist, pymc for Bayesian, lifelines for parametric.

statistical-analysisSkill