Skip to main content
ClaudeWave
Skill181 repo starsupdated 5d ago

data-pipeline-builder

Designs and builds ETL/ELT data pipelines. Takes data sources, destination, transformation requirements. Generates pipeline code (Python/SQL), scheduling config, error handling, monitoring setup, and data quality checks. Outputs data-pipeline-spec.md + implementation files.

Install in Claude Code
Copy
git clone --depth 1 https://github.com/OneWave-AI/claude-skills /tmp/data-pipeline-builder && cp -r /tmp/data-pipeline-builder/data-pipeline-builder ~/.claude/skills/data-pipeline-builder
Then start a new Claude Code session; the skill loads automatically.

SKILL.md

# Data Pipeline Builder

Design and implement production-grade ETL/ELT data pipelines: take data sources, a destination, and transformation requirements, then produce a complete pipeline specification plus all implementation files needed to run it.

## Contents

- `references/project-structure.md` -- output file layout, architecture pattern selection, component selection.
- `references/python-patterns.md` -- Python code standards and base extractor/transformer/loader/retry patterns.
- `references/quality-checks.md` -- composable data quality check framework and built-in checks.
- `references/orchestration-config.md` -- Airflow DAG, pipeline config YAML, and monitoring/alerting patterns.
- `references/spec-template.md` -- the `data-pipeline-spec.md` output template.

## Workflow

1. Gather requirements. If the user gave clear requirements, proceed to design. Otherwise ask targeted questions: data sources (databases, APIs, files, streams); destination (warehouse, lake, database); transformations (joins, aggregations, filters, business rules); freshness requirement (real-time, hourly, daily); technology preferences (Airflow, dbt, Spark, cloud provider); data quality and compliance requirements.

2. Analyze and design. Catalog each source (connection type, auth, schema, volume, CDC availability, rate limits). Define the destination (platform, schema design, partitioning, clustering, access patterns). Map transformations (field mappings, business logic, type conversions, joins, aggregations, deduplication, SCD handling, derived fields). Establish non-functional requirements (freshness SLA, processing window, failure tolerance, retention, compliance). Select an architecture pattern and components per `references/project-structure.md`.

3. Present the design before generating code. Confirm architecture, sources, destination, schedule, key transformations, and quality gates with the user, then proceed on approval.

4. Generate implementation. Produce all files following the layout in `references/project-structure.md`, customized to the specific pipeline with no placeholder code requiring manual editing:
   - For each source, generate a concrete extractor inheriting from `BaseExtractor` (see `references/python-patterns.md`).
   - For each transformation, generate a concrete transformer class or SQL file.
   - For each destination, generate a concrete loader inheriting from `BaseLoader`.
   - Generate the Airflow DAG with all task dependencies wired up and the pipeline config YAML (see `references/orchestration-config.md`).
   - Generate quality checks tailored to the data and monitoring config with appropriate alert thresholds (see `references/quality-checks.md` and `references/orchestration-config.md`).
   - Generate tests for all custom business logic.

5. Generate the specification last. Produce `data-pipeline-spec.md` using `references/spec-template.md`, referencing all implementation files and incorporating design decisions made during the process.

## Operating Rules

- Design for idempotency -- make every step safely re-runnable.
- Include watermark/checkpoint tracking for incremental pipelines.
- Include dead letter handling for records that fail processing.
- Include schema evolution handling -- sources will change their schemas.
- Never hardcode credentials -- use environment variables or secret managers.
- Never skip quality checks -- they are the first line of defense against bad data.
- Prefer SQL for transformations expressible in SQL; use Python for complex logic that does not map cleanly to SQL.
- Include a backfill strategy and an operational runbook covering common failure scenarios in the spec.
- Use structured logging throughout and track data lineage at every transformation step.
accessibility-auditorSkill

Audit websites for accessibility issues and WCAG compliance. Use when checking accessibility, fixing a11y issues, or ensuring WCAG compliance.

agent-armySkill

Deploy a 2-layer parallel agent hierarchy for large, parallelizable work — big refactors, multi-file migrations, codebase-wide audits, bulk generation. Layer 1 is 3-50+ specialist agents, each with its own full context window; Layer 2 is 2+ sub-agents per member. Includes git safety, tiered sizing, a pre-deploy gate, phantom-completion checks, and multi-wave follow-up.

agent-swarm-deployerSkill

Deploys swarms of sub-agents for massive parallel data processing tasks. Unlike agent-army (which is for code changes), this is for DATA tasks -- processing 1000 documents, analyzing datasets, bulk content generation. Configurable swarm size, task distribution, result aggregation, progress tracking, and error recovery.

agent-team-builderSkill

Designs and deploys custom agent teams for specific business workflows. Interactive discovery of business processes, then generates complete team configurations with specialized agent roles, tool access, communication protocols, and handoff rules.

agent-to-agentSkill

Agent-to-Agent (A2A) communication protocol. Connect two or more Claude agents that pass messages, share context, delegate tasks, and collaborate. Implements structured handoffs, shared memory, and multi-agent conversations.

ai-readiness-assessmentSkill

Assesses how ready a business is for AI adoption across six dimensions. Evaluates data maturity, tech stack, team skills, process documentation, budget, and culture. Generates a comprehensive ai-readiness-report.md with scores, gap analysis, and recommended starting points. Aligned with OneWave AI's audit methodology.

animateSkill

Generate animated videos and motion graphics from natural language descriptions. Creates a standalone Vite + React project with Framer Motion scenes that auto-play in the browser. Use when the user wants to create animations, motion graphics, video intros, animated presentations, or product demos.

api-documentation-writerSkill

Generate comprehensive API documentation including endpoint descriptions, request/response examples, authentication guides, error codes, and SDKs. Creates OpenAPI/Swagger specs, REST API docs, and developer-friendly reference materials. Use when users need to document APIs, create technical references, or write developer documentation.