Skill66 repo starsupdated 29d ago

database-architect

Designs relational and NoSQL database schemas, indexing strategies, migration plans, and data modeling patterns. Use when designing a database or when the user mentions database architecture, schema design, or data modeling.

View source Repository: software_development_department

Install in Claude Code

Copy

git clone --depth 1 https://github.com/tranhieutt/software_development_department /tmp/database-architect && cp -r /tmp/database-architect/.claude/skills/database-architect ~/.claude/skills/database-architect

Then start a new Claude Code session; the skill loads automatically.

Definition

SKILL.md

# Database Architect

## Workflow

1. **Understand domain**: Access patterns, scale targets, consistency needs, compliance requirements
2. **Select technology**: Match DB type to workload (see matrix below)
3. **Design schema**: Normalization level, relationships, constraints, temporal data strategy
4. **Plan indexing**: Query-pattern-driven index design (not speculative)
5. **Design caching**: Layer strategy with invalidation
6. **Plan migration**: Zero-downtime approach, rollback procedures
7. **Document decisions**: ADR with rationale and trade-offs

## Technology selection matrix

| Workload | Primary choice | Alternative |
|---|---|---|
| OLTP / relational | PostgreSQL | MySQL |
| Flexible documents | MongoDB | Firestore |
| Key-value / cache | Redis | DynamoDB |
| Time-series / IoT | TimescaleDB | InfluxDB |
| Analytical / OLAP | ClickHouse | BigQuery |
| Graph relationships | Neo4j | Amazon Neptune |
| Full-text search | Elasticsearch | Meilisearch |
| Globally distributed | CockroachDB | Google Spanner |
| Multi-tenant SaaS | PostgreSQL (row-level security) | Schema-per-tenant |

**Decision rule**: Choose PostgreSQL by default; deviate only when access patterns demand it with documented rationale.

## Non-obvious rules

- **Normalize first, denormalize with evidence** — premature denormalization creates update anomalies; measure before optimizing
- **Index on access patterns, not columns** — index the query, not the table; one slow-query explain plan is worth more than any speculation
- **Foreign keys always** — letting the application enforce referential integrity is a data corruption waiting to happen
- **JSONB for flexible attributes, not as a schema escape hatch** — use JSONB when fields are genuinely variable; not to avoid schema discipline
- **Partition late** — partition tables only once you have row counts >50M or explicit I/O pressure; early partitioning adds complexity with zero benefit
- **UUID v7 over v4** — v7 is time-ordered (k-sortable), avoids index fragmentation, same uniqueness guarantees

## Schema design patterns

```sql
-- Multi-tenancy: row-level security (best for <1000 tenants, shared infra)
ALTER TABLE orders ENABLE ROW LEVEL SECURITY;
CREATE POLICY tenant_isolation ON orders
  USING (tenant_id = current_setting('app.current_tenant_id')::uuid);

-- Soft delete + audit trail (never DELETE production data)
ALTER TABLE users ADD COLUMN deleted_at TIMESTAMPTZ;
ALTER TABLE users ADD COLUMN updated_by UUID REFERENCES users(id);
CREATE INDEX idx_users_active ON users(id) WHERE deleted_at IS NULL;

-- Temporal / slowly-changing dimensions
CREATE TABLE product_prices (
  id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
  product_id UUID NOT NULL REFERENCES products(id),
  price NUMERIC(10,2) NOT NULL,
  valid_from TIMESTAMPTZ NOT NULL DEFAULT NOW(),
  valid_until TIMESTAMPTZ  -- NULL = current price
);
```

## Indexing rules

```sql
-- Composite index: most selective column FIRST
CREATE INDEX idx_orders_user_status ON orders(user_id, status, created_at DESC);

-- Partial index: filter out the 95% noise
CREATE INDEX idx_orders_pending ON orders(created_at) WHERE status = 'pending';

-- Covering index: index-only scan (no heap access)
CREATE INDEX idx_users_email_name ON users(email) INCLUDE (name, avatar_url);

-- JSONB GIN index for flexible attribute queries
CREATE INDEX idx_metadata_gin ON events USING gin(metadata jsonb_path_ops);
```

## Migration strategy (non-negotiable steps)

```sql
-- 1. Expand: add new column nullable (no lock)
ALTER TABLE users ADD COLUMN phone VARCHAR(20);

-- 2. Backfill: batch update (never one giant UPDATE)
UPDATE users SET phone = '' WHERE phone IS NULL AND id BETWEEN x AND y;

-- 3. Constrain: add NOT NULL only after backfill complete
ALTER TABLE users ALTER COLUMN phone SET NOT NULL;

-- 4. Switch: deploy code using new column
-- 5. Contract: drop old column in separate release
ALTER TABLE users DROP COLUMN old_phone;
```

**Zero-downtime rule**: Never add a NOT NULL column without a default in a single migration on a live table — it acquires an ACCESS EXCLUSIVE lock.

## Caching architecture

| Layer | Tool | Strategy | Invalidation |
|---|---|---|---|
| Hot data | Redis | Cache-aside | TTL + event-driven |
| Query results | PostgreSQL materialized views | Refresh on schedule | `REFRESH MATERIALIZED VIEW CONCURRENTLY` |
| Session data | Redis | Write-through | TTL |
| Static references | App memory | Eager load on startup | Deploy |

## Scope

- Query tuning on existing system → `database-optimizer`
- Database operations, backups, maintenance → `database-admin`
- System-wide performance → `performance-engineer`
- ORM-specific patterns → `prisma-expert` / `drizzle-orm-expert`