Skill110 repo starsupdated 7mo ago

scalability-advisor

Scalability Advisor provides systematic guidance for scaling systems across growth stages from startup to enterprise, helping teams identify performance bottlenecks and design horizontally scalable architectures. Use when planning for 10x to 1000x growth, diagnosing performance issues, evaluating scaling strategies, or capacity planning for systems handling increasing user loads.

View source Repository: claude-cto-team

Install in Claude Code

Copy

git clone --depth 1 https://github.com/alirezarezvani/claude-cto-team /tmp/scalability-advisor && cp -r /tmp/scalability-advisor/skills/scalability-advisor ~/.claude/skills/scalability-advisor

Then start a new Claude Code session; the skill loads automatically.

Definition

SKILL.md

# Scalability Advisor

Provides systematic guidance for scaling systems at different growth stages, identifying bottlenecks, and designing for horizontal scalability.

## When to Use

- Planning for 10x, 100x, or 1000x growth
- Diagnosing current performance bottlenecks
- Designing new systems for scale
- Evaluating scaling strategies (vertical vs. horizontal)
- Capacity planning and infrastructure sizing

## Scaling Stages Framework

### Stage Overview

```
┌─────────────────────────────────────────────────────────────────────┐
│                    SCALING JOURNEY                                  │
├─────────────────────────────────────────────────────────────────────┤
│                                                                     │
│  Stage 1        Stage 2         Stage 3         Stage 4             │
│  Startup        Growth          Scale           Enterprise          │
│  0-10K users    10K-100K        100K-1M         1M+ users           │
│                                                                     │
│  Single         Add caching,    Horizontal      Global,             │
│  server         read replicas   scaling         multi-region        │
│                                                                     │
│  $100/mo        $1K/mo          $10K/mo         $100K+/mo           │
└─────────────────────────────────────────────────────────────────────┘
```

---

## Stage 1: Startup (0-10K Users)

### Architecture

```
┌────────────────────────────────────────┐
│           Single Server                │
│  ┌──────────────────────────────────┐  │
│  │  App Server (Node/Python/etc)    │  │
│  │  + Database (PostgreSQL)         │  │
│  │  + File Storage (local/S3)       │  │
│  └──────────────────────────────────┘  │
└────────────────────────────────────────┘
```

### Key Metrics

| Metric | Target | Warning |
|--------|--------|---------|
| Response time (P95) | < 500ms | > 1s |
| Database queries/request | < 10 | > 20 |
| Server CPU | < 70% | > 85% |
| Database connections | < 50% pool | > 80% pool |

### What to Focus On

**DO**:
- Write clean, maintainable code
- Use database indexes on frequently queried columns
- Implement basic monitoring (uptime, errors)
- Keep architecture simple (monolith is fine)

**DON'T**:
- Over-engineer for scale you don't have
- Add caching before you need it
- Split into microservices prematurely
- Worry about multi-region yet

### When to Move to Stage 2

- Database CPU consistently > 70%
- Response times degrading
- Single queries taking > 100ms
- Server resources maxed

---

## Stage 2: Growth (10K-100K Users)

### Architecture

```
┌─────────────────────────────────────────────────────────────┐
│                                                             │
│    ┌─────────┐      ┌─────────────────────────────────┐     │
│    │   CDN   │      │      Load Balancer              │     │
│    └────┬────┘      └──────────────┬──────────────────┘     │
│         │                          │                        │
│         │           ┌──────────────┼──────────────┐         │
│         │           │              │              │         │
│         ▼           ▼              ▼              ▼         │
│    ┌─────────┐ ┌─────────┐   ┌─────────┐   ┌─────────┐      │
│    │ Static  │ │ App 1   │   │ App 2   │   │ App 3   │      │
│    │ Assets  │ └────┬────┘   └────┬────┘   └────┬────┘      │
│    └─────────┘      │             │             │           │
│                     └──────────────┼────────────┘           │
│                                    │                        │
│                     ┌──────────────┼──────────────┐         │
│                     │              │              │         │
│                     ▼              ▼              ▼         │
│               ┌─────────┐   ┌─────────┐   ┌─────────┐       │
│               │ Primary │   │  Read   │   │  Redis  │       │
│               │   DB    │───│ Replica │   │  Cache  │       │
│               └─────────┘   └─────────┘   └─────────┘       │
│                                                             │
└─────────────────────────────────────────────────────────────┘
```

### Key Additions

| Component | Purpose | When to Add |
|-----------|---------|-------------|
| **CDN** | Static asset caching | Images, JS, CSS taking > 20% bandwidth |
| **Load Balancer** | Distribute traffic | Single server CPU > 70% |
| **Read Replicas** | Offload reads | > 80% database ops are reads |
| **Redis Cache** | Application caching | Same queries repeated frequently |
| **Job Queue** | Async processing | Background tasks blocking requests |

### Caching Strategy

```
Request Flow with Caching:

1. Check CDN (static assets)         ─► HIT: Return cached
                                           │
2. Check Application Cache (Redis)   ─► HIT: Return cached
                                           │
3. Check Database                    ─► Return + Cache result
```

**What to Cache**:
- Session data (TTL: session duration)
- User profile data (TTL: 5-15 minutes)
- API responses (TTL: varies by freshness needs)
- Database query results (TTL: 1-5 minutes)
- Computed values (TTL: based on computation cost)

### Database Optimization

```sql
-- Find slow queries
SELECT query, calls, mean_time, total_time
FROM pg_stat_statements
ORDER BY total_time DESC
LIMIT 20;

-- Find missing indexes
SELECT schemaname, tablename, indexrelname, idx_scan, seq_scan
FROM pg_stat_user_indexes
WHERE idx_scan = 0 AND seq_scan > 1000;
```

### When to Move to Stage 3

- Write traffic overwhelming single primary
- Cache hit rate plateauing despite optimization
- Read replicas can't keep up with replication lag
- Need independent scaling of components

---

## Stage 3: Scale (100K-1M Users)

### Architecture

```
┌──────────────────────────────────────────────────────────────────────┐
│                           CDN / Edge                                 │
└─────────────────────────────────────────────────