scalability-advisor
Guidance for scaling systems from startup to enterprise scale. Use when planning for growth, diagnosing bottlenecks, or designing systems that need to handle 10x-1000x current load.
git clone --depth 1 https://github.com/alirezarezvani/claude-cto-team /tmp/scalability-advisor && cp -r /tmp/scalability-advisor/skills/scalability-advisor ~/.claude/skills/scalability-advisorSKILL.md
# Scalability Advisor
Provides systematic guidance for scaling systems at different growth stages, identifying bottlenecks, and designing for horizontal scalability.
## When to Use
- Planning for 10x, 100x, or 1000x growth
- Diagnosing current performance bottlenecks
- Designing new systems for scale
- Evaluating scaling strategies (vertical vs. horizontal)
- Capacity planning and infrastructure sizing
## Scaling Stages Framework
### Stage Overview
```
┌─────────────────────────────────────────────────────────────────────┐
│ SCALING JOURNEY │
├─────────────────────────────────────────────────────────────────────┤
│ │
│ Stage 1 Stage 2 Stage 3 Stage 4 │
│ Startup Growth Scale Enterprise │
│ 0-10K users 10K-100K 100K-1M 1M+ users │
│ │
│ Single Add caching, Horizontal Global, │
│ server read replicas scaling multi-region │
│ │
│ $100/mo $1K/mo $10K/mo $100K+/mo │
└─────────────────────────────────────────────────────────────────────┘
```
---
## Stage 1: Startup (0-10K Users)
### Architecture
```
┌────────────────────────────────────────┐
│ Single Server │
│ ┌──────────────────────────────────┐ │
│ │ App Server (Node/Python/etc) │ │
│ │ + Database (PostgreSQL) │ │
│ │ + File Storage (local/S3) │ │
│ └──────────────────────────────────┘ │
└────────────────────────────────────────┘
```
### Key Metrics
| Metric | Target | Warning |
|--------|--------|---------|
| Response time (P95) | < 500ms | > 1s |
| Database queries/request | < 10 | > 20 |
| Server CPU | < 70% | > 85% |
| Database connections | < 50% pool | > 80% pool |
### What to Focus On
**DO**:
- Write clean, maintainable code
- Use database indexes on frequently queried columns
- Implement basic monitoring (uptime, errors)
- Keep architecture simple (monolith is fine)
**DON'T**:
- Over-engineer for scale you don't have
- Add caching before you need it
- Split into microservices prematurely
- Worry about multi-region yet
### When to Move to Stage 2
- Database CPU consistently > 70%
- Response times degrading
- Single queries taking > 100ms
- Server resources maxed
---
## Stage 2: Growth (10K-100K Users)
### Architecture
```
┌─────────────────────────────────────────────────────────────┐
│ │
│ ┌─────────┐ ┌─────────────────────────────────┐ │
│ │ CDN │ │ Load Balancer │ │
│ └────┬────┘ └──────────────┬──────────────────┘ │
│ │ │ │
│ │ ┌──────────────┼──────────────┐ │
│ │ │ │ │ │
│ ▼ ▼ ▼ ▼ │
│ ┌─────────┐ ┌─────────┐ ┌─────────┐ ┌─────────┐ │
│ │ Static │ │ App 1 │ │ App 2 │ │ App 3 │ │
│ │ Assets │ └────┬────┘ └────┬────┘ └────┬────┘ │
│ └─────────┘ │ │ │ │
│ └──────────────┼────────────┘ │
│ │ │
│ ┌──────────────┼──────────────┐ │
│ │ │ │ │
│ ▼ ▼ ▼ │
│ ┌─────────┐ ┌─────────┐ ┌─────────┐ │
│ │ Primary │ │ Read │ │ Redis │ │
│ │ DB │───│ Replica │ │ Cache │ │
│ └─────────┘ └─────────┘ └─────────┘ │
│ │
└─────────────────────────────────────────────────────────────┘
```
### Key Additions
| Component | Purpose | When to Add |
|-----------|---------|-------------|
| **CDN** | Static asset caching | Images, JS, CSS taking > 20% bandwidth |
| **Load Balancer** | Distribute traffic | Single server CPU > 70% |
| **Read Replicas** | Offload reads | > 80% database ops are reads |
| **Redis Cache** | Application caching | Same queries repeated frequently |
| **Job Queue** | Async processing | Background tasks blocking requests |
### Caching Strategy
```
Request Flow with Caching:
1. Check CDN (static assets) ─► HIT: Return cached
│
2. Check Application Cache (Redis) ─► HIT: Return cached
│
3. Check Database ─► Return + Cache result
```
**What to Cache**:
- Session data (TTL: session duration)
- User profile data (TTL: 5-15 minutes)
- API responses (TTL: varies by freshness needs)
- Database query results (TTL: 1-5 minutes)
- Computed values (TTL: based on computation cost)
### Database Optimization
```sql
-- Find slow queries
SELECT query, calls, mean_time, total_time
FROM pg_stat_statements
ORDER BY total_time DESC
LIMIT 20;
-- Find missing indexes
SELECT schemaname, tablename, indexrelname, idx_scan, seq_scan
FROM pg_stat_user_indexes
WHERE idx_scan = 0 AND seq_scan > 1000;
```
### When to Move to Stage 3
- Write traffic overwhelming single primary
- Cache hit rate plateauing despite optimization
- Read replicas can't keep up with replication lag
- Need independent scaling of components
---
## Stage 3: Scale (100K-1M Users)
### Architecture
```
┌──────────────────────────────────────────────────────────────────────┐
│ CDN / Edge │
└─────────────────────────────────────────────────Use this agent when you need comprehensive technical architecture guidance, strategic technology decisions, or system design for complex web/mobile applications with ML/AI integration. Specifically invoke this agent when
Use this agent when you need strategic technical leadership, complex task orchestration across multiple domains, or help translating business requirements into technical execution. This agent excels at breaking down ambiguous requests, routing work to specialized agents, and maintaining strategic context throughout complex projects.
Use this agent when you need strategic technical advice, architectural reviews, roadmap planning, or honest feedback on technical decisions. This includes evaluating project strategies, challenging assumptions, reviewing system designs, planning execution approaches, or getting brutally honest assessment of ideas and proposals.
Get CTO-level strategic and technical guidance
Get strategic guidance on build vs buy and technology decisions
Design system architecture with roadmap and technology recommendations
Validate plans, roadmaps, or proposals with ruthless honesty
Detect common technical and organizational anti-patterns in proposals, architectures, and plans. Use when strategic-cto-mentor needs to identify red flags before they become problems.