Skill284 estrellas del repoactualizado 4d ago

seaborn-statistical-visualization

Seaborn is a Python library for creating publication-quality statistical graphics directly from pandas DataFrames with minimal code. It provides functions for distribution plots (histograms, KDE, violin, box), relational plots (scatter, line), categorical comparisons, regression with confidence intervals, and correlation heatmaps. Use it for exploratory data analysis when automatic statistical estimation and default styling are sufficient; switch to plotly for interactive features or matplotlib for low-level customization.

Ver fuente Repositorio: SciAgent-Skills

Instalar en Claude Code

Copiar

git clone --depth 1 https://github.com/jaechang-hits/SciAgent-Skills /tmp/seaborn-statistical-visualization && cp -r /tmp/seaborn-statistical-visualization/legacy/seaborn-statistical-visualization ~/.claude/skills/seaborn-statistical-visualization

Después abre una sesión nueva de Claude Code; el skill carga automáticamente.

Definición

SKILL.md

# Seaborn — Statistical Visualization

## Overview

Seaborn is a Python visualization library for creating publication-quality statistical graphics with minimal code. It works directly with pandas DataFrames, provides automatic statistical estimation (means, CIs, KDE), and offers attractive default themes. Built on matplotlib for full customization access.

## When to Use

- Creating distribution plots (histograms, KDE, violin plots, box plots) for data exploration
- Visualizing relationships between variables with automatic trend fitting and confidence intervals
- Comparing distributions across categorical groups (treatment vs control, tissue types)
- Generating correlation heatmaps and clustered heatmaps
- Quick exploratory data analysis with `pairplot` for all pairwise relationships
- Multi-panel figures with automatic faceting by categorical variables
- For **interactive plots** with hover/zoom, use plotly instead
- For **low-level figure control** or custom layouts, use matplotlib directly

## Prerequisites

```bash
pip install seaborn matplotlib pandas
```

## Quick Start

```python
import seaborn as sns
import matplotlib.pyplot as plt
import pandas as pd

df = sns.load_dataset("tips")
sns.scatterplot(data=df, x="total_bill", y="tip", hue="day", style="time")
plt.title("Tips by Day and Time")
plt.tight_layout()
plt.savefig("scatter.png", dpi=150)
print("Saved scatter.png")
```

## Core API

### 1. Distribution Plots

Visualize univariate and bivariate distributions.

```python
import seaborn as sns
import matplotlib.pyplot as plt

df = sns.load_dataset("tips")

# Histogram with density normalization
fig, axes = plt.subplots(1, 3, figsize=(15, 4))

sns.histplot(data=df, x="total_bill", hue="time", stat="density",
             multiple="stack", ax=axes[0])
axes[0].set_title("Histogram")

# KDE (smooth density estimate)
sns.kdeplot(data=df, x="total_bill", hue="time", fill=True,
            bw_adjust=0.8, ax=axes[1])
axes[1].set_title("KDE")

# ECDF (empirical cumulative distribution)
sns.ecdfplot(data=df, x="total_bill", hue="time", ax=axes[2])
axes[2].set_title("ECDF")

plt.tight_layout()
plt.savefig("distributions.png", dpi=150)
print("Saved distributions.png")
```

```python
# Bivariate KDE with contours
sns.kdeplot(data=df, x="total_bill", y="tip", fill=True,
            levels=5, thresh=0.1, cmap="mako")
plt.title("Bivariate KDE")
plt.savefig("bivariate_kde.png", dpi=150)
```

### 2. Categorical Plots

Compare distributions or estimates across discrete categories.

```python
import seaborn as sns
import matplotlib.pyplot as plt

df = sns.load_dataset("tips")
fig, axes = plt.subplots(1, 3, figsize=(15, 4))

# Box plot — quartiles and outliers
sns.boxplot(data=df, x="day", y="total_bill", hue="sex",
            dodge=True, ax=axes[0])
axes[0].set_title("Box Plot")

# Violin plot — KDE + quartiles
sns.violinplot(data=df, x="day", y="total_bill", hue="sex",
               split=True, inner="quart", ax=axes[1])
axes[1].set_title("Violin Plot")

# Bar plot — mean with CI
sns.barplot(data=df, x="day", y="total_bill", hue="sex",
            estimator="mean", errorbar="ci", ax=axes[2])
axes[2].set_title("Bar Plot (mean ± 95% CI)")

plt.tight_layout()
plt.savefig("categorical.png", dpi=150)
print("Saved categorical.png")
```

```python
# Swarm plot — all individual observations, non-overlapping
sns.swarmplot(data=df, x="day", y="total_bill", hue="sex", dodge=True)
plt.title("Swarm Plot")
plt.savefig("swarm.png", dpi=150)
```

### 3. Relational Plots

Explore relationships between continuous variables.

```python
import seaborn as sns
import matplotlib.pyplot as plt

df = sns.load_dataset("tips")

# Scatter with multiple semantic mappings
sns.scatterplot(data=df, x="total_bill", y="tip",
                hue="day", size="size", style="time")
plt.title("Scatter with Multi-Encoding")
plt.savefig("relational.png", dpi=150)
```

```python
# Line plot with automatic aggregation and CI
fmri = sns.load_dataset("fmri")
sns.lineplot(data=fmri, x="timepoint", y="signal",
             hue="region", style="event", errorbar="sd")
plt.title("Line Plot (mean ± SD)")
plt.savefig("lineplot.png", dpi=150)
```

### 4. Regression Plots

Fit and visualize linear models.

```python
import seaborn as sns
import matplotlib.pyplot as plt

df = sns.load_dataset("tips")

fig, axes = plt.subplots(1, 2, figsize=(12, 4))

# Linear regression with CI band
sns.regplot(data=df, x="total_bill", y="tip", ci=95, ax=axes[0])
axes[0].set_title("Linear Regression")

# Residual plot (check model assumptions)
sns.residplot(data=df, x="total_bill", y="tip", ax=axes[1])
axes[1].set_title("Residuals")

plt.tight_layout()
plt.savefig("regression.png", dpi=150)
print("Saved regression.png")
```

### 5. Matrix Plots

Visualize rectangular data (correlations, heatmaps).

```python
import seaborn as sns
import matplotlib.pyplot as plt
import numpy as np

# Correlation heatmap
df = sns.load_dataset("tips")
corr = df.select_dtypes(include=[np.number]).corr()

sns.heatmap(corr, annot=True, fmt=".2f", cmap="coolwarm",
            center=0, square=True, linewidths=0.5)
plt.title("Correlation Heatmap")
plt.tight_layout()
plt.savefig("heatmap.png", dpi=150)
print("Saved heatmap.png")
```

```python
# Clustered heatmap with hierarchical clustering
flights = sns.load_dataset("flights").pivot(index="month", columns="year", values="passengers")
sns.clustermap(flights, cmap="viridis", standard_scale=1,
               figsize=(10, 8), linewidths=0.5)
plt.savefig("clustermap.png", dpi=150)
```

### 6. Figure-Level Functions and Faceting

Create multi-panel figures with automatic faceting.

```python
import seaborn as sns

df = sns.load_dataset("tips")

# relplot — faceted scatter/line plots
g = sns.relplot(data=df, x="total_bill", y="tip",
                col="time", row="sex", hue="smoker",
                kind="scatter", height=3, aspect=1.2)
g.set_axis_labels("Total Bill ($)", "Tip ($)")
g.savefig("faceted_scatter.png", dpi=150)
print("Saved facete

Del mismo repositorio

sciagent-skill-creatorSkill

opentrons-integrationSkill

Opentrons Protocol API v2 for OT-2/Flex: Python protocols for pipetting, serial dilutions, PCR, plate replication; control thermocycler, heater-shaker, magnetic, temperature modules. Use pylabrobot for multi-vendor.

plotly-interactive-visualizationSkill

Interactive visualization with Plotly. 40+ chart types (scatter, line, heatmap, 3D, geographic) with hover, zoom, pan. Two APIs: Plotly Express (DataFrame) and Graph Objects (fine control). For static publication figures use matplotlib; for statistical grammar use seaborn.

single-cell-annotationSkill

Best practices for single-cell RNA-seq cell type annotation including marker-based, reference-based, and automated classification approaches.

pymc-bayesian-modelingSkill

Bayesian modeling with PyMC 5: priors, likelihood, NUTS/ADVI sampling, diagnostics (R-hat, ESS), LOO/WAIC comparison, prediction. Hierarchical, logistic, GP variants; predictive checks.

scikit-survival-analysisSkill

Time-to-event modeling with scikit-survival: Cox PH (elastic net), Random Survival Forests, Boosting, SVMs for censored data. C-index, Brier, time-dependent AUC; Kaplan-Meier, Nelson-Aalen, competing risks. Pipeline/GridSearchCV compatible. Use statsmodels for frequentist, pymc for Bayesian, lifelines for parametric.

statistical-analysisSkill

statsmodels-statistical-modelingSkill

Python statistical modeling: regression (OLS, WLS, GLM), discrete (Logit, Poisson, NegBin), time series (ARIMA, SARIMAX, VAR), with rigorous inference, diagnostics, and hypothesis tests. Use scikit-learn for ML; statistical-analysis for test choice.