Skill586 estrellas del repoactualizado 1mo ago

nw-data-architecture-patterns

This Claude Code skill provides decision frameworks and implementation patterns for selecting and designing data architectures including data warehouses, lakes, lakehouses, and mesh systems. It covers schema design approaches like star and snowflake schemas, compares implementation methodologies such as Kimball versus Inmon, details zone-based data lake organization, describes the medallion architecture for lakehouses, and outlines ETL/ELT pipeline strategies and governance considerations. Use this skill when architecting data systems, evaluating which platform suits specific organizational needs, designing schemas for analytics workloads, or establishing data governance policies.

Ver fuente Repositorio: nWave

Instalar en Claude Code

Copiar

git clone --depth 1 https://github.com/nWave-ai/nWave /tmp/nw-data-architecture-patterns && cp -r /tmp/nw-data-architecture-patterns/nWave/skills/nw-data-architecture-patterns ~/.claude/skills/nw-data-architecture-patterns

Después abre una sesión nueva de Claude Code; el skill carga automáticamente.

Definición

SKILL.md

# Data Architecture Patterns

## Architecture Selection Decision Tree

Structured only -> **Data Warehouse** | Mixed + SQL analytics -> **Data Lakehouse** | Mixed + ML-primary -> **Data Lake** | Large org + autonomous domains -> **Data Mesh**

## Data Warehouse

Schema: structured, schema-on-write | Data: tables, rows, columns | Governance: centralized | Query: SQL analytics, BI | Architecture: centralized single source of truth

### Schema Patterns

**Star Schema**: Central fact table (measures) surrounded by denormalized dimension tables. Best for BI dashboards, standard reporting.

**Snowflake Schema**: Normalized dimensions (dimensions reference other dimensions). Reduces storage, increases JOIN complexity. Best when storage cost matters more than query speed.

### Kimball vs Inmon

**Kimball (Bottom-Up)**: Build data marts first, integrate later | Star schema, business-process driven | Faster initial delivery | Best for quick wins, department-level analytics

**Inmon (Top-Down)**: Build enterprise DW first, derive data marts | Normalized 3NF enterprise model | Higher upfront effort | Best for large enterprises needing single source of truth

Technology: Snowflake | Amazon Redshift | Google BigQuery | Azure Synapse Analytics

## Data Lake

Schema-on-read, flexible | All formats (structured, semi-structured, unstructured) | Raw data in native format | Query via Athena, Spark SQL, PySpark, Pandas | Risk: "data swamp" without governance

### Organization
Zones: **raw** (landing, original format) -> **curated** (cleaned, validated) -> **processed** (transformed for use cases) -> **archive** (cold storage)

### Anti-Patterns
- No metadata catalog -> undiscoverable data
- No access controls -> security/compliance risk
- No data quality checks -> garbage in/out
- No retention policy -> unbounded cost growth

Technology: S3 + Athena/Glue | Azure Data Lake Storage + Synapse | HDFS + Hive

## Data Lakehouse

Combines warehouse reliability with lake flexibility | Schema enforcement on write with evolution support | ACID transactions on lake storage | Supports both BI/SQL and ML/data science workloads

### Medallion Architecture (Bronze / Silver / Gold)

**Bronze**: Raw data as-is, append-only for auditability, partitioned by ingestion date, schema-on-read
**Silver**: Quality rules (null checks, range validation, referential integrity) | Deduplication on business keys | Schema enforced | SCD applied
**Gold**: Business-level aggregations | Dimensional models (star/snowflake) | Pre-computed metrics/KPIs | Optimized for BI/reporting

Technology: Databricks (Delta Lake) | Apache Iceberg | Apache Hudi

## Data Mesh

### Core Principles (Martin Fowler)
1. **Domain-oriented ownership**: Data owned by domain teams, not central
2. **Data as a product**: Each domain publishes discoverable, trustworthy, self-describing data products
3. **Self-serve data platform**: Infrastructure team provides platform for domain teams
4. **Federated computational governance**: Global standards with domain autonomy

**Use when**: Large org with autonomous domain teams | Central data team is bottleneck | Domain expertise needed | Platform engineering maturity exists
**Avoid when**: Small team (<50 engineers) | Simple data needs | No platform capability | Unclear domain boundaries

## ETL vs ELT Pipeline Design

### ETL (Extract-Transform-Load)
Transform before loading via dedicated engine (Informatica, Talend, SSIS). Best for complex transforms, constrained targets, regulatory requirements. Scaling limited by transform engine.

### ELT (Extract-Load-Transform)
Load raw first, transform using target compute (dbt, Snowflake SQL, BigQuery SQL). Best for cloud DWs with elastic compute, preserving raw data. Scales with target system.

### Pipeline Design Principles
- **Idempotency**: Re-running produces same result (use MERGE/upsert, not INSERT)
- **Incremental processing**: Process only new/changed data (watermarks, CDC)
- **Schema evolution**: Handle added/removed columns gracefully (schema registry)
- **Data quality gates**: Validate between stages (null rates, row counts, value ranges)
- **Observability**: Log metrics (rows processed, duration, errors, freshness)

### Orchestration
Apache Airflow: DAG-based, Python-native, wide adoption | Prefect: modern, dynamic workflows | Dagster: software-defined assets

## Streaming Architecture

### Apache Kafka
Distributed event streaming platform. Concepts: topics, partitions, consumer groups, offsets. At-least-once delivery (exactly-once with transactions). Use as event bus, message broker, stream storage.

### Apache Flink
Stateful stream processing engine. Concepts: DataStreams, windows (tumbling, sliding, session), state management. Exactly-once with checkpointing. Common pattern: Sources -> Kafka (durable event buffer) -> Flink (stateful compute) -> Sinks.

### Architecture Selection
**Streaming**: real-time dashboards, fraud detection, IoT, event-driven | **Batch**: overnight reporting, historical analysis, ML training | **Lambda**: parallel batch + stream (complex, prefer Kappa) | **Kappa**: stream-only, reprocess from Kafka log (simpler)

## Scaling Strategies

### Vertical (Scale Up)
Add CPU/RAM/storage to existing server | Simpler ops, no app changes | Hard limit: largest hardware | Use first for moderate growth

### Horizontal (Scale Out)

**Read Replicas**: Replicate to read-only copies | Route reads to replicas, writes to primary | Trade-off: replication lag (eventual consistency) | Use for read-heavy workloads

**Partitioning (Single Server)**: Range (date, alphabetical) | List (region, category) | Hash (even distribution) | Benefits: query pruning, maintenance (drop old partitions)

**Sharding (Multiple Servers)**: Distribute data across DB instances by shard key | Strategies: range-based, hash-based, directory-based, geographic

**Shard Key Selection** (most impactful decision):
- High cardinality for even distribution
- Even access frequency to avoid hot shards
- Qu

Del mismo repositorio

nw-ab-critique-dimensionsSkill

Review dimensions for validating agent quality - template compliance, safety, testing, and priority validation

nw-abr-critique-dimensionsSkill

Review dimensions for validating agent quality - template compliance, safety, testing, and priority validation

nw-ad-critique-dimensionsSkill

Review dimensions for acceptance test quality - happy path bias, GWT compliance, business language purity, coverage completeness, walking skeleton user-centricity, priority validation, observable behavior assertions, traceability coverage, and walking skeleton boundary proof

nw-agent-creation-workflowSkill

Detailed 5-phase workflow for creating agents - from requirements analysis through validation and iterative refinement

nw-agent-testingSkill

5-layer testing approach for agent validation including adversarial testing, security validation, and prompt injection resistance

nw-architectural-styles-tradeoffsSkill

Architectural style selection decision matrices, trade-off analysis, structural enforcement rules, and combination patterns. Load when choosing or evaluating architecture styles.

nw-architecture-patternsSkill

Comprehensive architecture patterns, methodologies, quality frameworks, and evaluation methods for solution architects. Load when designing system architecture or selecting patterns.

nw-at-completeness-checkSkill

Canonical AT completeness gate — research-anchored 7-category taxonomy (C1-C7) + 15-item mechanical checklist. Paradigm-neutral. Drives acceptance-designer reviewer verdict deterministically.