Skill426 repo starsupdated 8mo ago

csv-data-visualizer

CSV Data Visualizer creates interactive Plotly charts and statistical analyses from CSV files, including histograms, scatter plots, box plots, correlation heatmaps, and time series visualizations. Use this skill when users need exploratory data analysis, distribution analysis, relationship comparisons, trend identification, or automated data profiling with presentation-ready interactive visualizations.

View source Repository: ai-labs-claude-skills

Install in Claude Code

Copy

git clone --depth 1 https://github.com/ailabs-393/ai-labs-claude-skills /tmp/csv-data-visualizer && cp -r /tmp/csv-data-visualizer/packages/skills/csv-data-visualizer ~/.claude/skills/csv-data-visualizer

Then start a new Claude Code session; the skill loads automatically.

Definition

SKILL.md

# CSV Data Visualizer

## Overview

This skill enables comprehensive data visualization and analysis for CSV files. It provides three main capabilities: (1) creating individual interactive visualizations using Plotly, (2) automatic data profiling with statistical summaries, and (3) generating multi-plot dashboards. The skill is optimized for exploratory data analysis, statistical reporting, and creating presentation-ready visualizations.

## When to Use This Skill

Invoke this skill when users request:
- "Visualize this CSV data"
- "Create a histogram/scatter plot/box plot from this data"
- "Show me the distribution of [column]"
- "Generate a dashboard for this dataset"
- "Profile this CSV file" or "Analyze this data"
- "Create a correlation heatmap"
- "Show trends over time"
- "Compare [variable] across [categories]"

## Core Capabilities

### 1. Individual Visualizations

Create specific chart types for detailed analysis using the `visualize_csv.py` script.

**Available Chart Types:**

**Statistical Plots:**
```bash
# Histogram - distribution of numeric data
python3 scripts/visualize_csv.py data.csv --histogram column_name --bins 30

# Box plot - show quartiles and outliers
python3 scripts/visualize_csv.py data.csv --boxplot column_name

# Box plot grouped by category
python3 scripts/visualize_csv.py data.csv --boxplot salary --group-by department

# Violin plot - distribution with probability density
python3 scripts/visualize_csv.py data.csv --violin column_name --group-by category
```

**Relationship Analysis:**
```bash
# Scatter plot with automatic trend line
python3 scripts/visualize_csv.py data.csv --scatter height weight

# Scatter plot with color and size encoding
python3 scripts/visualize_csv.py data.csv --scatter x y --color category --size value

# Correlation heatmap for all numeric columns
python3 scripts/visualize_csv.py data.csv --correlation
```

**Time Series:**
```bash
# Line chart for single variable
python3 scripts/visualize_csv.py data.csv --line date sales

# Multiple variables on same chart
python3 scripts/visualize_csv.py data.csv --line date "sales,revenue,profit"
```

**Categorical Data:**
```bash
# Bar chart (counts categories automatically)
python3 scripts/visualize_csv.py data.csv --bar category

# Pie chart for composition
python3 scripts/visualize_csv.py data.csv --pie region
```

**Output Formats:**
Specify output file with desired format extension:
```bash
# Interactive HTML (default)
python3 scripts/visualize_csv.py data.csv --histogram age -o output.html

# Static image formats
python3 scripts/visualize_csv.py data.csv --scatter x y -o plot.png
python3 scripts/visualize_csv.py data.csv --correlation -o heatmap.pdf
python3 scripts/visualize_csv.py data.csv --bar category -o chart.svg
```

### 2. Automatic Data Profiling

Generate comprehensive data quality and statistical reports using the `data_profile.py` script.

**Text Report (default):**
```bash
python3 scripts/data_profile.py data.csv
```

**HTML Report:**
```bash
python3 scripts/data_profile.py data.csv -f html -o report.html
```

**JSON Report:**
```bash
python3 scripts/data_profile.py data.csv -f json -o profile.json
```

**What the Profiler Provides:**
- File information (size, dimensions)
- Dataset overview (shape, memory usage, duplicates)
- Column-by-column analysis (types, missing data, unique values)
- Missing data patterns and completeness
- Statistical summary for numeric columns (mean, std, quartiles, skewness, kurtosis)
- Categorical column analysis (frequency counts, most/least common values)
- Data quality checks (high missing data, duplicate rows, constant columns, high cardinality)

**When to Use Profiling:**
Always recommend running data profiling BEFORE creating visualizations when:
- User is unfamiliar with the dataset
- Data quality is unknown
- Need to identify appropriate visualization types
- Exploring a new dataset for the first time

### 3. Multi-Plot Dashboards

Create comprehensive dashboards with multiple visualizations using the `create_dashboard.py` script.

**Automatic Dashboard:**
Analyzes data types and automatically creates appropriate visualizations:
```bash
python3 scripts/create_dashboard.py data.csv
```

Custom output location:
```bash
python3 scripts/create_dashboard.py data.csv -o my_dashboard.html
```

Control number of plots:
```bash
python3 scripts/create_dashboard.py data.csv --max-plots 9
```

**Custom Dashboard from Config:**
Create a JSON configuration file specifying exact plots:
```bash
python3 scripts/create_dashboard.py data.csv --config config.json
```

**Dashboard Config Format:**
```json
{
  "title": "Sales Analysis Dashboard",
  "plots": [
    {"type": "histogram", "column": "revenue"},
    {"type": "box", "column": "revenue", "group_by": "region"},
    {"type": "scatter", "column": "advertising", "group_by": "revenue"},
    {"type": "bar", "column": "product_category"},
    {"type": "correlation"}
  ]
}
```

**Dashboard Plot Types:**
- `histogram`: Distribution of numeric column
- `box`: Box plot, optionally grouped by category
- `scatter`: Relationship between two numeric columns
- `bar`: Count of categorical values
- `correlation`: Heatmap of numeric correlations

## Workflow Decision Tree

Use this decision tree to determine the appropriate approach:

```
User provides CSV file
│
├─ "Profile this data" / "Analyze this data" / Unfamiliar dataset
│  └─> Run data_profile.py first
│     Then offer visualization options based on findings
│
├─ "Create dashboard" / "Overview of the data" / Multiple visualizations needed
│  ├─ User knows exact plots wanted
│  │  └─> Create JSON config → run create_dashboard.py with config
│  └─ User wants automatic dashboard
│     └─> Run create_dashboard.py (auto mode)
│
└─ Specific visualization requested ("histogram", "scatter plot", etc.)
   └─> Use visualize_csv.py with appropriate flag
```

## Best Practices

### Starting Analysis
1. **Always profile first** for unfamiliar datasets: `python3 scripts/data_p