csv-data-summarizer
Analyzes CSV files and automatically generates comprehensive summaries with statistical insights, data quality checks, and visualizations using Python and pandas. No questions asked — just upload a CSV and get a full analysis immediately.
git clone --depth 1 https://github.com/besoeasy/open-skills /tmp/csv-data-summarizer && cp -r /tmp/csv-data-summarizer/skills/csv-data-summarizer ~/.claude/skills/csv-data-summarizerSKILL.md
# CSV Data Summarizer
This skill analyzes any CSV file and delivers a complete statistical summary with visualizations in one shot. It adapts intelligently to the type of data it finds — sales, customer, financial, operational, survey, or generic tabular data.
## When to Use This Skill
- User uploads or references a CSV file
- Asking to summarize, analyze, or visualize tabular data
- Requesting insights from a dataset
- Wanting to understand data structure and quality
## Behavior Rule
**Do not ask the user what they want. Immediately run the full analysis.**
When a CSV is provided, skip questions like "What would you like me to do?" and go straight to the analysis.
## Required Tools / Libraries
```bash
pip install pandas matplotlib seaborn
```
## How It Works
The skill inspects the data first, then automatically determines which analyses are relevant:
| Data type | Focus areas |
|-----------|-------------|
| Sales / e-commerce | Time-series trends, revenue, product performance |
| Customer data | Distributions, segmentation, geographic patterns |
| Financial | Trend analysis, statistics, correlations |
| Operational | Time-series, performance metrics, distributions |
| Survey | Frequency analysis, cross-tabulations |
| Generic | Adapts based on column types found |
Visualizations are only created when they make sense:
- Time-series plots → only if date/timestamp columns exist
- Correlation heatmaps → only if multiple numeric columns exist
- Category distributions → only if categorical columns exist
- Histograms → for numeric distributions when relevant
## Core Function
```python
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
def summarize_csv(file_path):
df = pd.read_csv(file_path)
summary = []
charts_created = []
# --- Overview ---
summary.append("=" * 60)
summary.append("DATA OVERVIEW")
summary.append("=" * 60)
summary.append(f"Rows: {df.shape[0]:,} | Columns: {df.shape[1]}")
summary.append(f"\nColumns: {', '.join(df.columns.tolist())}")
summary.append("\nDATA TYPES:")
for col, dtype in df.dtypes.items():
summary.append(f" • {col}: {dtype}")
# --- Data quality ---
missing = df.isnull().sum().sum()
missing_pct = (missing / (df.shape[0] * df.shape[1])) * 100
summary.append("\nDATA QUALITY:")
if missing:
summary.append(f"Missing values: {missing:,} ({missing_pct:.2f}% of total data)")
for col in df.columns:
col_missing = df[col].isnull().sum()
if col_missing > 0:
summary.append(f" • {col}: {col_missing:,} ({(col_missing / len(df)) * 100:.1f}%)")
else:
summary.append("No missing values — dataset is complete.")
# --- Numeric analysis ---
numeric_cols = df.select_dtypes(include='number').columns.tolist()
if numeric_cols:
summary.append("\nNUMERICAL ANALYSIS:")
summary.append(str(df[numeric_cols].describe()))
if len(numeric_cols) > 1:
corr_matrix = df[numeric_cols].corr()
summary.append("\nCORRELATIONS:")
summary.append(str(corr_matrix))
plt.figure(figsize=(10, 8))
sns.heatmap(corr_matrix, annot=True, cmap='coolwarm', center=0, square=True, linewidths=1)
plt.title('Correlation Heatmap')
plt.tight_layout()
plt.savefig('correlation_heatmap.png', dpi=150)
plt.close()
charts_created.append('correlation_heatmap.png')
# --- Categorical analysis ---
categorical_cols = [c for c in df.select_dtypes(include='object').columns if 'id' not in c.lower()]
if categorical_cols:
summary.append("\nCATEGORICAL ANALYSIS:")
for col in categorical_cols[:5]:
value_counts = df[col].value_counts()
summary.append(f"\n{col}:")
for val, count in value_counts.head(10).items():
summary.append(f" • {val}: {count:,} ({(count / len(df)) * 100:.1f}%)")
# --- Time series analysis ---
date_cols = [c for c in df.columns if 'date' in c.lower() or 'time' in c.lower()]
if date_cols:
date_col = date_cols[0]
df[date_col] = pd.to_datetime(df[date_col], errors='coerce')
date_range = df[date_col].max() - df[date_col].min()
summary.append(f"\nTIME SERIES ANALYSIS:")
summary.append(f"Date range: {df[date_col].min()} to {df[date_col].max()}")
summary.append(f"Span: {date_range.days} days")
if numeric_cols:
fig, axes = plt.subplots(min(3, len(numeric_cols)), 1, figsize=(12, 4 * min(3, len(numeric_cols))))
if len(numeric_cols) == 1:
axes = [axes]
for idx, num_col in enumerate(numeric_cols[:3]):
ax = axes[idx]
df.groupby(date_col)[num_col].mean().plot(ax=ax, linewidth=2)
ax.set_title(f'{num_col} Over Time')
ax.set_xlabel('Date')
ax.set_ylabel(num_col)
ax.grid(True, alpha=0.3)
plt.tight_layout()
plt.savefig('time_series_analysis.png', dpi=150)
plt.close()
charts_created.append('time_series_analysis.png')
# --- Distribution plots ---
if numeric_cols:
fig, axes = plt.subplots(2, 2, figsize=(12, 10))
axes = axes.flatten()
for idx, col in enumerate(numeric_cols[:4]):
axes[idx].hist(df[col].dropna(), bins=30, edgecolor='black', alpha=0.7)
axes[idx].set_title(f'Distribution of {col}')
axes[idx].set_xlabel(col)
axes[idx].set_ylabel('Frequency')
axes[idx].grid(True, alpha=0.3)
for idx in range(len(numeric_cols[:4]), 4):
axes[idx].set_visible(False)
plt.tight_layout()
plt.savefig('distributions.png', dpi=150)
plt.close()
charts_created.append('distributions.png')
# --- Categorical distribution plots ---
if categorical_coEncrypt and decrypt files or streams using age — a simple, modern, and secure encryption tool with small explicit keys, passphrase support, SSH key support, post-quantum hybrid keys, and UNIX-style composability. No config options, no footguns.
Upload and host files anonymously using decentralized storage with Originless and IPFS.
Automate web browsers for AI agents using agent-browser CLI with deterministic element selection.
Star all repositories from a GitHub user automatically. Use when: (1) Supporting open source creators, (2) Bulk discovery of useful projects, or (3) Automating GitHub engagement.
Automatically creates user-facing changelogs from git commits by analyzing commit history, categorizing changes, and transforming technical commits into clear, customer-friendly release notes. Turns hours of manual changelog writing into minutes of automated generation.
Log all chat messages to a SQLite database for searchable history and audit. Use when: (1) Building chat history, (2) Auditing conversations, (3) Searching past messages, or (4) User asks to log chats.
Check cryptocurrency wallet balances across multiple blockchains using free public APIs.
Calculate line-of-sight and road distances between two cities using free OpenStreetMap services.