Skill537 estrellas del repoactualizado 28d ago

dataset-profiler

dataset-profiler generates a structured markdown report that documents a dataset's schema, missing values, statistical distributions, outliers, and data quality issues before analysis begins. Use this skill when encountering unfamiliar data, validating data quality problems, or preparing for reproducible analysis workflows with CSV, Parquet, or JSONL files.

Ver fuente Repositorio: LLM-Agents-Ecosystem-Handbook

Instalar en Claude Code

Copiar

git clone --depth 1 https://github.com/oxbshw/LLM-Agents-Ecosystem-Handbook /tmp/dataset-profiler && cp -r /tmp/dataset-profiler/skills/catalog/dataset-profiler ~/.claude/skills/dataset-profiler

Después abre una sesión nueva de Claude Code; el skill carga automáticamente.

Definición

SKILL.md

# Dataset Profiler

## When to use
- A new dataset arrives and you need to understand it before using it
- Before reproducing an analysis that referenced a dataset
- When data quality is suspect ("the chart looked wrong")

## When NOT to use
- Streaming / online data (this is point-in-time)
- Sensitive PII without an explicit allow-list

## Inputs
| Name | Type | Required | Notes |
|---|---|---|---|
| `path` | path | yes | CSV / Parquet / JSONL |
| `target` | string | no | column of interest (gets extra distribution detail) |

## Outputs
`profile.md` with: Source, Schema, Missingness, Distributions, Outliers, Joins / keys, Gotchas, Open questions.

## Workflow
1. Load with the right reader (extension-detected); record row count, file size
2. Schema: column → dtype → nullable → example value
3. Missingness: % per column, top columns by missingness
4. Distributions: numeric (min, p50, p95, max, std), categorical (top-k, cardinality)
5. Outliers: flag rows beyond p99 + 3·IQR for numerics
6. Identify potential keys (unique columns) and join candidates
7. **Gotchas**: timezone columns, mixed encodings, suspicious all-zero rows, magic values (`-1`, `9999-12-31`)
8. **Open questions**: ambiguous columns / values that need owner input

## References
- [`references/profile-template.md`](references/profile-template.md)

## Success criteria
- Every column appears in Schema + Missingness
- Outliers section includes example rows
- Gotchas section is non-empty (real datasets always have some)

## Failure modes
- File too large to read in memory → switch to streaming + sampled stats; flag prominently
- Encoding fails → try common alternatives; if all fail, surface and stop

Del mismo repositorio

New SkillSkill

adr-writerSkill

Use when capturing an architecture decision so it survives turnover — produces an ADR-NNNN.md from context, options considered, and the chosen path.

api-design-reviewerSkill

Use when reviewing a proposed REST or GraphQL API change before merge — checks contract clarity, backwards compatibility, errors, pagination, auth, and naming.

incident-postmortemSkill

Use after an incident is resolved — drafts a blameless postmortem from timeline notes, alerts, and chat threads.

pr-summarizerSkill

Use when opening a PR — produces a clean PR description (what / why / how to verify / risks) from a branch diff against base.

sprint-plannerSkill

Use when planning the next sprint — turns ticket intake + team capacity into a planned sprint with explicit non-goals.

agent-memory-curatorSkill

Use after a session to promote useful episodic notes from logs/episodic/ into distilled, dated entries in MEMORY.md and memory/semantic/.

mcp-security-reviewerSkill

Use before connecting a new MCP server to your agent — produces a structured security review covering source, permissions, tools, network, and approvals.