database-design
This skill designs database schemas, normalization strategies, indexing approaches, and migration plans for SQL and NoSQL systems. Use it when creating or modifying table structures, optimizing slow queries, addressing N+1 query problems, planning database migrations, or evaluating data modeling decisions and performance concerns.
git clone --depth 1 https://github.com/CloudAI-X/claude-workflow-v2 /tmp/database-design && cp -r /tmp/database-design/skills/database-design ~/.claude/skills/database-designSKILL.md
# Database Design
### When to Load
- **Trigger**: Schema design, migrations, query optimization, indexing strategies, data modeling, N+1 fixes
- **Skip**: No database work involved in the current task
## Database Design Workflow
Copy this checklist and track progress:
```
Database Design Progress:
- [ ] Step 1: Identify entities and relationships
- [ ] Step 2: Normalize schema (3NF minimum)
- [ ] Step 3: Evaluate denormalization needs
- [ ] Step 4: Design indexes for query patterns
- [ ] Step 5: Write and optimize critical queries
- [ ] Step 6: Plan migration strategy
- [ ] Step 7: Configure connection pooling
- [ ] Step 8: Validate against anti-patterns checklist
```
## Schema Design Principles
### Normalization Forms
```
1NF: Atomic values, no repeating groups
2NF: 1NF + no partial dependencies (all non-key columns depend on full PK)
3NF: 2NF + no transitive dependencies (non-key columns don't depend on other non-key columns)
```
```sql
-- WRONG: Unnormalized
CREATE TABLE orders (
id SERIAL PRIMARY KEY,
customer_name TEXT,
customer_email TEXT, -- duplicated across orders
product1_name TEXT, -- repeating groups
product1_qty INT,
product2_name TEXT,
product2_qty INT
);
-- CORRECT: Normalized to 3NF
CREATE TABLE customers (
id SERIAL PRIMARY KEY,
name TEXT NOT NULL,
email TEXT UNIQUE NOT NULL
);
CREATE TABLE orders (
id SERIAL PRIMARY KEY,
customer_id INT REFERENCES customers(id),
created_at TIMESTAMPTZ DEFAULT NOW()
);
CREATE TABLE order_items (
id SERIAL PRIMARY KEY,
order_id INT REFERENCES orders(id),
product_id INT REFERENCES products(id),
quantity INT NOT NULL CHECK (quantity > 0)
);
```
### When to Denormalize
Denormalize only when you have measured proof of performance issues:
```sql
-- Acceptable denormalization: precomputed counter to avoid COUNT(*)
ALTER TABLE posts ADD COLUMN comment_count INT DEFAULT 0;
-- Update via trigger or application code
CREATE FUNCTION update_comment_count() RETURNS TRIGGER AS $$
BEGIN
IF TG_OP = 'INSERT' THEN
UPDATE posts SET comment_count = comment_count + 1 WHERE id = NEW.post_id;
ELSIF TG_OP = 'DELETE' THEN
UPDATE posts SET comment_count = comment_count - 1 WHERE id = OLD.post_id;
END IF;
RETURN NULL;
END;
$$ LANGUAGE plpgsql;
```
## Indexing Strategy
### Index Types and When to Use
```
B-tree (default): Equality, range, sorting, LIKE 'prefix%'
Hash: Equality only (rarely better than B-tree)
GIN: Full-text search, JSONB, arrays
GiST: Geometry, range types, full-text
BRIN: Large tables with naturally ordered data (timestamps)
```
### Composite Indexes
```sql
-- Column order matters: leftmost prefix rule
CREATE INDEX idx_users_status_created ON users (status, created_at);
-- This index supports:
-- WHERE status = 'active' -- YES
-- WHERE status = 'active' AND created_at > '2024' -- YES
-- WHERE created_at > '2024' -- NO (skips first column)
```
### Partial and Covering Indexes
```sql
-- Partial index: only index rows matching condition
CREATE INDEX idx_orders_pending ON orders (created_at)
WHERE status = 'pending'; -- smaller index, faster lookups
-- Covering index: include columns to avoid table lookup
CREATE INDEX idx_users_email_covering ON users (email)
INCLUDE (name, avatar_url); -- index-only scan for profile lookups
```
### Index Anti-patterns
```sql
-- WRONG: Index on low-cardinality column alone
CREATE INDEX idx_users_active ON users (is_active); -- boolean = 2 values
-- WRONG: Too many indexes (slows writes)
-- Every INSERT/UPDATE must update ALL indexes
-- CORRECT: Composite index targeting actual queries
CREATE INDEX idx_users_active_created ON users (is_active, created_at DESC)
WHERE is_active = true;
```
## Query Optimization
### Reading EXPLAIN Plans
```sql
EXPLAIN ANALYZE SELECT u.name, COUNT(o.id)
FROM users u
JOIN orders o ON o.user_id = u.id
WHERE u.status = 'active'
GROUP BY u.name;
-- Key things to look for:
-- Seq Scan -> missing index (on large tables)
-- Nested Loop -> fine for small sets, bad for large joins
-- Hash Join -> good for large equi-joins
-- Sort -> consider index to avoid sort
-- actual time -> real execution time
-- rows -> if estimated vs actual differ wildly, run ANALYZE
```
### N+1 Query Detection and Prevention
```python
# WRONG: N+1 queries (1 query for users + N queries for orders)
users = db.query(User).all()
for user in users:
orders = db.query(Order).filter(Order.user_id == user.id).all() # N queries!
# CORRECT: Eager loading with SQLAlchemy
users = db.query(User).options(joinedload(User.orders)).all()
# CORRECT: Batch query
user_ids = [u.id for u in users]
orders = db.query(Order).filter(Order.user_id.in_(user_ids)).all()
orders_by_user = defaultdict(list)
for order in orders:
orders_by_user[order.user_id].append(order)
```
```javascript
// WRONG: N+1 with Prisma
const users = await prisma.user.findMany();
for (const user of users) {
const orders = await prisma.order.findMany({ where: { userId: user.id } }); // N+1!
}
// CORRECT: Include relation
const users = await prisma.user.findMany({
include: { orders: true },
});
// CORRECT: Batch with findMany + in
const userIds = users.map((u) => u.id);
const orders = await prisma.order.findMany({
where: { userId: { in: userIds } },
});
```
### Pagination
```sql
-- WRONG: OFFSET pagination (rescans all skipped rows)
SELECT * FROM posts ORDER BY created_at DESC LIMIT 20 OFFSET 10000;
-- CORRECT: Cursor-based pagination (keyset)
SELECT * FROM posts
WHERE created_at < '2024-01-15T10:30:00Z'
ORDER BY created_at DESC
LIMIT 20;
```
## Migration Patterns
### Safe Migration Rules
```
1. Never rename a column in one step (add new, migrate data, drop old)
2. Never drop a column that's still read by running code
3. Add columns as nullable or with defaults
4. Create indexes CONCURRENTExpert code review specialist. Use PROACTIVELY after writing or modifying code, before commits, when asked to review changes, PR review, code quality check, lint, or standards audit. Focuses on quality, security, performance, and maintainability.
Expert debugging specialist for errors, test failures, crashes, segmentation faults, memory leaks, timeouts, race conditions, deadlocks, and unexpected behavior. Use PROACTIVELY when encountering any error, exception, or failing test. Performs systematic root cause analysis.
Technical documentation specialist. Use for creating README files, API documentation, architecture docs, inline comments, user guides, changelogs, migration guides, release notes, FAQs, and troubleshooting docs. MUST BE USED when documentation is needed or when code changes require doc updates.
Master coordinator for complex multi-step tasks. Use PROACTIVELY when a task involves 2+ modules, requires delegation to specialists, needs architectural planning, or involves GitHub PR workflows. MUST BE USED for open-ended requests like "improve", "enhance", "build", "scale", "refactor", "add feature", "system design", "architecture", "complex task", or when implementing features from GitHub issues.
Code refactoring specialist for improving code quality, reducing technical debt, eliminating code smells, reducing complexity, and applying design patterns. Use PROACTIVELY when code needs restructuring, simplification, tech debt reduction, or when applying DRY/SOLID principles.
Security specialist for vulnerability detection, secure coding review, and security hardening. Use PROACTIVELY when handling authentication, authorization, encryption, secrets, credentials, OAuth, JWT, CORS, headers, user input, API keys, or sensitive data. Checks for OWASP Top 10 and common vulnerabilities.
Testing strategy specialist for designing test suites, writing tests, and ensuring comprehensive coverage. Use PROACTIVELY when adding new features, fixing bugs, improving test coverage, creating test plans, mocking strategies, handling flaky tests, or writing integration/E2E tests.
Add tests for recently changed files or specified code