Skip to main content
ClaudeWave
Skill1.3k repo starsupdated yesterday

ddia-systems

The ddia-systems skill applies the "Designing Data-Intensive Applications" framework to help architects and engineers make deliberate trade-off choices when designing data systems. Use it when evaluating database selection, replication strategies, partitioning approaches, consistency models, transaction guarantees, stream versus batch processing, and distributed system trade-offs between consistency and availability. The skill scores data architectures against principled reasoning about data models, storage engines, query languages, replication, partitioning, transactions, and data pipelines to identify gaps between current designs and production-ready robustness.

Install in Claude Code
Copy
git clone --depth 1 https://github.com/wondelai/skills /tmp/ddia-systems && cp -r /tmp/ddia-systems/ddia-systems ~/.claude/skills/ddia-systems
Then start a new Claude Code session; the skill loads automatically.

SKILL.md

# Designing Data-Intensive Applications Framework

A principled approach to building reliable, scalable, and maintainable data systems. Apply these principles when choosing databases, designing schemas, architecting distributed systems, or reasoning about consistency and fault tolerance.

## Core Principle

**Data outlives code.** Applications are rewritten and frameworks come and go, but data persists for decades -- prioritize the long-term correctness, durability, and evolvability of the data layer. Most applications are data-intensive, not compute-intensive: the hard problems are data volume, complexity, and rate of change, and explicit consistency/availability/latency trade-offs separate robust systems from fragile ones.

## Scoring

**Goal: 10/10.** Rate any data architecture 0-10 against the principles below: deliberate trade-off choices for data models, storage, replication, partitioning, transactions, and pipelines score high; accidental complexity and ignored failure modes score low. Report the current score and the improvements needed to reach 10/10.

## The DDIA Framework

Seven domains for reasoning about data-intensive systems:

### 1. Data Models and Query Languages

**Core concept:** The data model shapes how you think about the problem. Relational, document, and graph models each impose different constraints and enable different query patterns.

**Why it works:** Choosing the wrong data model forces application code to compensate for representational mismatch, adding accidental complexity that compounds over time.

**Key insights:**
- Relational models excel at many-to-many relationships and ad-hoc queries; document models at one-to-many relationships and locality; graph models at recursive traversals over interconnected data
- Schema-on-write (relational) catches errors early; schema-on-read (document) offers flexibility
- Polyglot persistence -- different stores for different access patterns -- is often the right answer
- Object-relational impedance mismatch is a real cost; document models reduce it for self-contained aggregates

**Code applications:**

| Context | Pattern | Example |
|---------|---------|---------|
| **User profiles with nested data** | Document model for self-contained aggregates | Profile, addresses, and preferences in one MongoDB document |
| **Social network connections** | Graph model for relationship traversal | Neo4j Cypher: `MATCH (a)-[:FOLLOWS*2]->(b)` for friend-of-friend |
| **Financial ledger with joins** | Relational model for referential integrity | PostgreSQL foreign keys between accounts, transactions, entries |

See: [references/data-models.md](references/data-models.md) for relational/document/graph trade-offs and query language comparisons.

### 2. Storage Engines

**Core concept:** Storage engines trade off read performance against write performance. Log-structured engines (LSM trees) optimize writes; page-oriented engines (B-trees) balance reads and writes.

**Why it works:** Understanding your database's storage engine lets you predict performance characteristics, choose appropriate indexes, and avoid pathological workloads.

**Key insights:**
- LSM trees: append-only writes, periodic compaction, excellent write throughput, higher read amplification
- B-trees: in-place updates, predictable read latency, write amplification from page splits
- Write amplification (one logical write causing multiple physical writes) matters for SSDs with limited write cycles
- Column-oriented storage dramatically improves analytical queries through compression and vectorized processing
- In-memory databases are fast because they avoid encoding overhead, not because they avoid disk

**Code applications:**

| Context | Pattern | Example |
|---------|---------|---------|
| **High write throughput** | LSM-tree engine | Cassandra or RocksDB for time-series ingestion at 100K+ writes/sec |
| **Mixed read/write OLTP** | B-tree engine | PostgreSQL B-tree indexes for transactional point lookups |
| **Analytical queries** | Column-oriented storage | ClickHouse or Parquet for scanning billions of rows, few columns |

See: [references/storage-engines.md](references/storage-engines.md) for LSM vs B-tree internals, compaction, and column storage.

### 3. Replication

**Core concept:** Replication keeps copies of data on multiple machines for fault tolerance, scalability, and latency reduction. The core challenge is handling changes consistently.

**Why it works:** Every replication strategy trades off consistency, availability, and latency. Making the trade-off explicit prevents subtle anomalies that surface only under load or failure.

**Key insights:**
- Single-leader: simple, strong consistency possible, but the leader is a bottleneck and single point of failure
- Multi-leader: better write availability across data centers, but complex conflict resolution
- Leaderless: highest availability via quorum reads/writes, but needs careful conflict handling
- Replication lag causes read-your-writes, monotonic-read, and causality violations
- Synchronous replication guarantees durability but adds latency; asynchronous risks data loss on failover
- CRDTs and last-writer-wins resolve conflicts with very different correctness guarantees

**Code applications:**

| Context | Pattern | Example |
|---------|---------|---------|
| **Read-heavy web app** | Single-leader with read replicas | PostgreSQL primary + read replicas behind pgBouncer |
| **Multi-region writes** | Multi-leader replication | CockroachDB or Spanner with bounded staleness |
| **Shopping cart availability** | Leaderless with merge | DynamoDB with last-writer-wins or application-level cart merge |

See: [references/replication.md](references/replication.md) for lag anomalies, conflict resolution, and CRDTs.

### 4. Partitioning

**Core concept:** Partitioning (sharding) distributes data across nodes so each handles a subset, enabling horizontal scaling beyond a single machine.

**Why it works:** Without partit
37signals-waySkill

Build lean, opinionated products using the 37signals philosophy from Getting Real, Rework, and Shape Up. Use when the user mentions "Getting Real", "Rework", "Shape Up", "37signals", "Basecamp method", "six-week cycles", "fixed time variable scope", "appetite vs estimates", "betting table", "breadboarding", "fat marker sketch", "build less", "underdo the competition", or "opinionated software". Also trigger when cutting scope to ship faster, running small teams, avoiding long-term roadmaps, or eliminating meetings. Covers shaping, betting, building, and the art of saying no. For MVP validation, see lean-startup. For design sprints, see design-sprint.

blue-ocean-strategySkill

Create uncontested market space using value innovation instead of competing head-to-head. Use when the user mentions "blue ocean", "red ocean", "strategy canvas", "ERRC framework", "value innovation", "non-customers", "buyer utility map", "eliminate-reduce-raise-create", or "uncontested market". Also trigger when comparing pricing strategies, exploring new market categories, finding underserved customer segments, or asking how to stop competing on price. Covers the Four Actions Framework, buyer utility map, and value-cost trade-offs. For tech adoption strategy, see crossing-the-chasm. For product positioning, see obviously-awesome.

clean-architectureSkill

Structure software around the Dependency Rule: source code dependencies point inward from frameworks to use cases to entities. Use when the user mentions "architecture layers", "dependency rule", "ports and adapters", "hexagonal architecture", "use case boundary", "onion architecture", "screaming architecture", or "framework independence". Also trigger when decoupling business logic from databases or frameworks, defining module boundaries, or debating where to put business rules. Covers component principles, boundaries, and SOLID. For code quality, see clean-code. For domain modeling, see domain-driven-design.

clean-codeSkill

Write readable, maintainable code through disciplined naming, small functions, and clean error handling. Use when the user mentions "code review", "naming conventions", "function too long", "code smells", "readable code", "boy scout rule", "single responsibility", or "unit test quality". Also trigger when reviewing pull requests for readability, refactoring messy functions, debating comment styles, or improving error handling patterns. Covers SRP, comment discipline, formatting, and unit testing. For refactoring techniques, see refactoring-patterns. For architecture, see clean-architecture.

contagiousSkill

Engineer word-of-mouth and virality using the STEPPS framework (Social Currency, Triggers, Emotion, Public, Practical Value, Stories). Use when the user mentions "go viral", "word of mouth", "shareable content", "social currency", "why people share", "viral loop", "referral program", or "organic growth". Also trigger when designing shareable features, crafting social media campaigns, or building products that spread through peer recommendation. Covers environmental triggers and high-arousal emotional content. For sticky messaging, see made-to-stick. For persuasion tactics, see influence-psychology.

continuous-discoverySkill

Build a weekly cadence of customer touchpoints using Opportunity Solution Trees, assumption mapping, and interview snapshots. Use when the user mentions "continuous discovery", "opportunity solution tree", "weekly interviews", "assumption testing", "discovery habits", "product trio", or "outcome-based roadmap". Also trigger when setting up regular customer feedback loops, prioritizing which experiments to run, or connecting discovery insights to delivery work. Covers experience mapping, co-creation, and prioritizing opportunities. For interview technique, see mom-test. For team structure, see inspired-product.

cro-methodologySkill

Audit websites and landing pages for conversion issues and design evidence-based A/B tests. Use when the user mentions "landing page isnt converting", "conversion rate", "A/B test", "why visitors leave", "objection handling", "bounce rate", "split testing", or "conversion funnel". Also trigger when diagnosing why signups are low, designing experiment hypotheses, or auditing checkout flows for friction points. Covers funnel mapping, persuasion assets, and objection/counter-objection frameworks. For overall marketing strategy, see one-page-marketing. For usability issues, see ux-heuristics.

crossing-the-chasmSkill

Navigate the technology adoption lifecycle from early adopters to mainstream market. Use when the user mentions "crossing the chasm", "beachhead segment", "whole product", "early adopters vs. mainstream", "tech go-to-market", "bowling pin strategy", "technology adoption lifecycle", or "pragmatist buyers". Also trigger when a startup has early traction but struggles to grow beyond initial users, or when planning go-to-market for technical products. Covers D-Day analogy, bowling-pin strategy, and positioning against incumbents. For product positioning, see obviously-awesome. For new market creation, see blue-ocean-strategy.