load-balancing-patterns
This skill guides selection and implementation of load balancing solutions across infrastructure layers, covering Layer 4 and Layer 7 routing approaches, health check configuration, session management strategies, and deployment patterns. Use it when distributing traffic across multiple servers, implementing high availability and failover mechanisms, routing based on application-level criteria like URLs or headers, managing session persistence, deploying to Kubernetes, or configuring multi-region traffic management and zero-downtime deployment strategies.
git clone --depth 1 https://github.com/ancoleman/ai-design-components /tmp/load-balancing-patterns && cp -r /tmp/load-balancing-patterns/skills/load-balancing-patterns ~/.claude/skills/load-balancing-patternsSKILL.md
# Load Balancing Patterns
Distribute traffic across infrastructure using the appropriate load balancing approach, from simple round-robin to global multi-region failover.
## When to Use This Skill
Use load-balancing-patterns when:
- Distributing traffic across multiple application servers
- Implementing high availability and failover
- Routing traffic based on URLs, headers, or geographic location
- Managing session persistence across stateless backends
- Deploying applications to Kubernetes clusters
- Configuring global traffic management across regions
- Implementing zero-downtime deployments (blue-green, canary)
- Selecting between cloud-managed and self-managed load balancers
## Core Load Balancing Concepts
### Layer 4 vs Layer 7
**Layer 4 (L4) - Transport Layer:**
- Routes based on IP address and port (TCP/UDP packets)
- No application data inspection, lower latency, higher throughput
- Protocol agnostic, preserves client IP addresses
- Use for: Database connections, video streaming, gaming, financial transactions, non-HTTP protocols
**Layer 7 (L7) - Application Layer:**
- Routes based on HTTP URLs, headers, cookies, request body
- Full application data visibility, SSL/TLS termination, caching, WAF integration
- Content-based routing capabilities
- Use for: Web applications, REST APIs, microservices, GraphQL endpoints, complex routing logic
For detailed comparison including performance benchmarks and hybrid approaches, see `references/l4-vs-l7-comparison.md`.
### Load Balancing Algorithms
| Algorithm | Distribution Method | Use Case |
|-----------|-------------------|----------|
| **Round Robin** | Sequential | Stateless, similar servers |
| **Weighted Round Robin** | Capacity-based | Different server specs |
| **Least Connections** | Fewest active connections | Long-lived connections |
| **Least Response Time** | Fastest server | Performance-sensitive |
| **IP Hash** | Client IP-based | Session persistence |
| **Resource-Based** | CPU/memory metrics | Varying workloads |
### Health Check Types
**Shallow (Liveness):** Is the process alive?
- Endpoint: `/health/live` or `/live`
- Returns: 200 if process running
- Use for: Process monitoring, container health
**Deep (Readiness):** Can the service handle requests?
- Endpoint: `/health/ready` or `/ready`
- Validates: Database, cache, external API connectivity
- Use for: Load balancer routing decisions
**Health Check Hysteresis:** Different thresholds for marking up vs down to prevent flapping
- Example: 3 failures to mark down, 2 successes to mark up
For complete health check implementation patterns, see `references/health-check-strategies.md`.
## Cloud Load Balancers
### AWS Load Balancing
**Application Load Balancer (ALB) - Layer 7:**
- Use for: HTTP/HTTPS applications, microservices, WebSocket
- Features: Path/host/header routing, AWS WAF integration, Lambda targets
- Choose when: Content-based routing needed
**Network Load Balancer (NLB) - Layer 4:**
- Use for: Ultra-low latency (<1ms), TCP/UDP, static IPs, millions RPS
- Features: Preserves source IP, TLS termination
- Choose when: Non-HTTP protocols, performance critical
**Global Accelerator - Layer 4 Global:**
- Use for: Multi-region applications, global users, DDoS protection
- Features: Anycast IPs, automatic regional failover
### GCP Load Balancing
**Application LB (L7):** Global HTTPS LB, Cloud CDN integration, Cloud Armor (WAF/DDoS)
**Network LB (L4):** Regional TCP/UDP, pass-through balancing, session affinity
**Cloud Load Balancing:** Single anycast IP, global distribution, backend buckets
### Azure Load Balancing
**Application Gateway (L7):** WAF integration, URL-based routing, SSL termination, autoscaling
**Load Balancer (L4):** Basic and Standard SKUs, health probes, HA ports
**Traffic Manager (Global):** DNS-based routing (priority, weighted, performance, geographic)
For complete cloud provider configurations and Terraform examples, see `references/cloud-load-balancers.md`.
## Self-Managed Load Balancers
### NGINX
**Best for:** General-purpose HTTP/HTTPS load balancing, web application stacks
**Capabilities:**
- HTTP reverse proxy with multiple algorithms
- TCP/UDP stream load balancing
- SSL/TLS termination
- Passive health checks (open source), active health checks (NGINX Plus)
- Cookie-based sticky sessions (NGINX Plus)
**Basic configuration:**
```nginx
upstream backend {
least_conn;
server backend1.example.com:8080 weight=3;
server backend2.example.com:8080 weight=2;
keepalive 32;
}
server {
listen 80;
location / {
proxy_pass http://backend;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
}
}
```
For complete NGINX patterns and advanced configurations, see `references/nginx-patterns.md`.
### HAProxy
**Best for:** Maximum performance, database load balancing, resource efficiency
**Capabilities:**
- Highest raw throughput, lowest memory footprint
- 10+ load balancing algorithms
- Sophisticated health checks (HTTP, TCP, Redis, MySQL, etc.)
- Cookie or IP-based persistence
**Basic configuration:**
```haproxy
frontend http_front
bind *:80
default_backend web_servers
backend web_servers
balance roundrobin
option httpchk GET /health
server web1 192.168.1.101:8080 check
server web2 192.168.1.102:8080 check
```
For complete HAProxy patterns, see `references/haproxy-patterns.md`.
### Envoy
**Best for:** Microservices, Kubernetes, service mesh integration
**Capabilities:**
- Cloud-native design with dynamic configuration (xDS APIs)
- Circuit breakers, retries, timeouts
- Advanced health checks (TCP, HTTP, gRPC)
- Excellent observability
For complete Envoy patterns, see `references/envoy-patterns.md`.
### Traefik
**Best for:** Docker/Kubernetes environments, dynamic configuration, ease of use
**Capabilities:**
- Automatic service discovery
- Native Kubernetes integration
- Built-in Let's Encrypt support
- Middleware system (auth, rateManage Linux systems covering systemd services, process management, filesystems, networking, performance tuning, and troubleshooting. Use when deploying applications, optimizing server performance, diagnosing production issues, or managing users and security on Linux servers.
Data pipelines, feature stores, and embedding generation for AI/ML systems. Use when building RAG pipelines, ML feature serving, or data transformations. Covers feature stores (Feast, Tecton), embedding pipelines, chunking strategies, orchestration (Dagster, Prefect, Airflow), dbt transformations, data versioning (LakeFS), and experiment tracking (MLflow, W&B).
Strategic guidance for designing modern data platforms, covering storage paradigms (data lake, warehouse, lakehouse), modeling approaches (dimensional, normalized, data vault, wide tables), data mesh principles, and medallion architecture patterns. Use when architecting data platforms, choosing between centralized vs decentralized patterns, selecting table formats (Iceberg, Delta Lake), or designing data governance frameworks.
Design cloud network architectures with VPC patterns, subnet strategies, zero trust principles, and hybrid connectivity. Use when planning VPC topology, implementing multi-cloud networking, or establishing secure network segmentation for cloud workloads.
Design comprehensive security architectures using defense-in-depth, zero trust principles, threat modeling (STRIDE, PASTA), and control frameworks (NIST CSF, CIS Controls, ISO 27001). Use when designing security for new systems, auditing existing architectures, or establishing security governance programs.
Assembles component outputs from AI Design Components skills into unified, production-ready component systems with validated token integration, proper import chains, and framework-specific scaffolding. Use as the capstone skill after running theming, layout, dashboard, data-viz, or feedback skills to wire components into working React/Next.js, Python, or Rust projects.
Builds AI chat interfaces and conversational UI with streaming responses, context management, and multi-modal support. Use when creating ChatGPT-style interfaces, AI assistants, code copilots, or conversational agents. Handles streaming text, token limits, regeneration, feedback loops, tool usage visualization, and AI-specific error patterns. Provides battle-tested components from leading AI products with accessibility and performance built in.
Constructs secure, efficient CI/CD pipelines with supply chain security (SLSA), monorepo optimization, caching strategies, and parallelization patterns for GitHub Actions, GitLab CI, and Argo Workflows. Use when setting up automated testing, building, or deployment workflows.