Skill1.4k repo starsupdated 6d ago

database-design

This skill designs database schemas, normalization strategies, indexing approaches, and migration plans for SQL and NoSQL systems. Use it when creating or modifying table structures, optimizing slow queries, addressing N+1 query problems, planning database migrations, or evaluating data modeling decisions and performance concerns.

View source Repository: claude-workflow-v2

Install in Claude Code

Copy

git clone --depth 1 https://github.com/CloudAI-X/claude-workflow-v2 /tmp/database-design && cp -r /tmp/database-design/skills/database-design ~/.claude/skills/database-design

Then start a new Claude Code session; the skill loads automatically.

Definition

SKILL.md

# Database Design

### When to Load

- **Trigger**: Schema design, migrations, query optimization, indexing strategies, data modeling, N+1 fixes
- **Skip**: No database work involved in the current task

## Database Design Workflow

Copy this checklist and track progress:

```
Database Design Progress:
- [ ] Step 1: Identify entities and relationships
- [ ] Step 2: Normalize schema (3NF minimum)
- [ ] Step 3: Evaluate denormalization needs
- [ ] Step 4: Design indexes for query patterns
- [ ] Step 5: Write and optimize critical queries
- [ ] Step 6: Plan migration strategy
- [ ] Step 7: Configure connection pooling
- [ ] Step 8: Validate against anti-patterns checklist
```

## Schema Design Principles

### Normalization Forms

```
1NF: Atomic values, no repeating groups
2NF: 1NF + no partial dependencies (all non-key columns depend on full PK)
3NF: 2NF + no transitive dependencies (non-key columns don't depend on other non-key columns)
```

```sql
-- WRONG: Unnormalized
CREATE TABLE orders (
  id SERIAL PRIMARY KEY,
  customer_name TEXT,
  customer_email TEXT,        -- duplicated across orders
  product1_name TEXT,         -- repeating groups
  product1_qty INT,
  product2_name TEXT,
  product2_qty INT
);

-- CORRECT: Normalized to 3NF
CREATE TABLE customers (
  id SERIAL PRIMARY KEY,
  name TEXT NOT NULL,
  email TEXT UNIQUE NOT NULL
);

CREATE TABLE orders (
  id SERIAL PRIMARY KEY,
  customer_id INT REFERENCES customers(id),
  created_at TIMESTAMPTZ DEFAULT NOW()
);

CREATE TABLE order_items (
  id SERIAL PRIMARY KEY,
  order_id INT REFERENCES orders(id),
  product_id INT REFERENCES products(id),
  quantity INT NOT NULL CHECK (quantity > 0)
);
```

### When to Denormalize

Denormalize only when you have measured proof of performance issues:

```sql
-- Acceptable denormalization: precomputed counter to avoid COUNT(*)
ALTER TABLE posts ADD COLUMN comment_count INT DEFAULT 0;

-- Update via trigger or application code
CREATE FUNCTION update_comment_count() RETURNS TRIGGER AS $$
BEGIN
  IF TG_OP = 'INSERT' THEN
    UPDATE posts SET comment_count = comment_count + 1 WHERE id = NEW.post_id;
  ELSIF TG_OP = 'DELETE' THEN
    UPDATE posts SET comment_count = comment_count - 1 WHERE id = OLD.post_id;
  END IF;
  RETURN NULL;
END;
$$ LANGUAGE plpgsql;
```

## Indexing Strategy

### Index Types and When to Use

```
B-tree (default):  Equality, range, sorting, LIKE 'prefix%'
Hash:              Equality only (rarely better than B-tree)
GIN:               Full-text search, JSONB, arrays
GiST:              Geometry, range types, full-text
BRIN:              Large tables with naturally ordered data (timestamps)
```

### Composite Indexes

```sql
-- Column order matters: leftmost prefix rule
CREATE INDEX idx_users_status_created ON users (status, created_at);

-- This index supports:
--   WHERE status = 'active'                          -- YES
--   WHERE status = 'active' AND created_at > '2024'  -- YES
--   WHERE created_at > '2024'                        -- NO (skips first column)
```

### Partial and Covering Indexes

```sql
-- Partial index: only index rows matching condition
CREATE INDEX idx_orders_pending ON orders (created_at)
  WHERE status = 'pending';  -- smaller index, faster lookups

-- Covering index: include columns to avoid table lookup
CREATE INDEX idx_users_email_covering ON users (email)
  INCLUDE (name, avatar_url);  -- index-only scan for profile lookups
```

### Index Anti-patterns

```sql
-- WRONG: Index on low-cardinality column alone
CREATE INDEX idx_users_active ON users (is_active);  -- boolean = 2 values

-- WRONG: Too many indexes (slows writes)
-- Every INSERT/UPDATE must update ALL indexes

-- CORRECT: Composite index targeting actual queries
CREATE INDEX idx_users_active_created ON users (is_active, created_at DESC)
  WHERE is_active = true;
```

## Query Optimization

### Reading EXPLAIN Plans

```sql
EXPLAIN ANALYZE SELECT u.name, COUNT(o.id)
FROM users u
JOIN orders o ON o.user_id = u.id
WHERE u.status = 'active'
GROUP BY u.name;

-- Key things to look for:
-- Seq Scan         -> missing index (on large tables)
-- Nested Loop      -> fine for small sets, bad for large joins
-- Hash Join         -> good for large equi-joins
-- Sort             -> consider index to avoid sort
-- actual time      -> real execution time
-- rows             -> if estimated vs actual differ wildly, run ANALYZE
```

### N+1 Query Detection and Prevention

```python
# WRONG: N+1 queries (1 query for users + N queries for orders)
users = db.query(User).all()
for user in users:
    orders = db.query(Order).filter(Order.user_id == user.id).all()  # N queries!

# CORRECT: Eager loading with SQLAlchemy
users = db.query(User).options(joinedload(User.orders)).all()

# CORRECT: Batch query
user_ids = [u.id for u in users]
orders = db.query(Order).filter(Order.user_id.in_(user_ids)).all()
orders_by_user = defaultdict(list)
for order in orders:
    orders_by_user[order.user_id].append(order)
```

```javascript
// WRONG: N+1 with Prisma
const users = await prisma.user.findMany();
for (const user of users) {
  const orders = await prisma.order.findMany({ where: { userId: user.id } }); // N+1!
}

// CORRECT: Include relation
const users = await prisma.user.findMany({
  include: { orders: true },
});

// CORRECT: Batch with findMany + in
const userIds = users.map((u) => u.id);
const orders = await prisma.order.findMany({
  where: { userId: { in: userIds } },
});
```

### Pagination

```sql
-- WRONG: OFFSET pagination (rescans all skipped rows)
SELECT * FROM posts ORDER BY created_at DESC LIMIT 20 OFFSET 10000;

-- CORRECT: Cursor-based pagination (keyset)
SELECT * FROM posts
WHERE created_at < '2024-01-15T10:30:00Z'
ORDER BY created_at DESC
LIMIT 20;
```

## Migration Patterns

### Safe Migration Rules

```
1. Never rename a column in one step (add new, migrate data, drop old)
2. Never drop a column that's still read by running code
3. Add columns as nullable or with defaults
4. Create indexes CONCURRENT