architecting-networks
This Claude Code skill guides architects through designing secure, scalable cloud network topologies across AWS, GCP, and Azure platforms. It offers decision frameworks for VPC design patterns, subnet segmentation strategies, zero trust implementation, and hybrid connectivity scenarios. Use this skill when planning new VPC architectures, implementing network segmentation and security controls, establishing multi-cloud or hybrid connectivity, or migrating from flat to sophisticated network designs.
git clone --depth 1 https://github.com/ancoleman/ai-design-components /tmp/architecting-networks && cp -r /tmp/architecting-networks/skills/architecting-networks ~/.claude/skills/architecting-networksSKILL.md
# Network Architecture
Design secure, scalable cloud network architectures using proven patterns across AWS, GCP, and Azure. This skill provides decision frameworks for VPC design, subnet strategy, zero trust implementation, and hybrid connectivity.
## When to Use This Skill
Invoke this skill when:
- Designing VPC/VNet topology for new cloud environments
- Implementing network segmentation and security controls
- Planning multi-VPC or multi-cloud connectivity
- Establishing hybrid cloud connectivity (on-premises to cloud)
- Migrating from flat network to sophisticated architecture
- Implementing zero trust network principles
- Optimizing network costs and performance
## Core Network Architecture Patterns
### Pattern 1: Flat (Single VPC) Architecture
**Use When:** Small applications, single environment, simple security requirements, team < 10 engineers
**Characteristics:**
- All resources in one VPC with subnet-level segmentation
- Public, private, and database subnet tiers
- Simplest to understand and manage
- No inter-VPC routing complexity
**Tradeoffs:**
- ✓ Lowest cost, fastest to set up
- ✗ Poor isolation, difficult to scale, entire VPC is blast radius
### Pattern 2: Multi-VPC (Isolated) Architecture
**Use When:** Multiple environments (dev/staging/prod), strong isolation requirements, compliance mandates separation
**Characteristics:**
- Separate VPCs per environment or workload
- No direct connectivity without explicit setup
- Independent CIDR ranges
**Tradeoffs:**
- ✓ Strong blast radius containment, clear security boundaries
- ✗ Management overhead, duplicate infrastructure, higher costs
### Pattern 3: Hub-and-Spoke (Transit Gateway) Architecture
**Use When:** 5+ VPCs need communication, centralized security inspection required, hybrid connectivity, multi-account setup
**Characteristics:**
- Central hub VPC/Transit Gateway
- Spoke VPCs connect to hub
- All inter-VPC traffic routes through hub
**Tradeoffs:**
- ✓ Simplified routing, centralized security, scales easily (100+ VPCs)
- ✗ Transit Gateway costs (~$0.05/hour + $0.02/GB), increased latency (hub hop)
### Pattern 4: Full Mesh (VPC Peering) Architecture
**Use When:** Small number of VPCs (< 5), low latency critical, no centralized inspection needed
**Characteristics:**
- Every VPC directly connected via peering
- Direct VPC-to-VPC communication
**Tradeoffs:**
- ✓ Lowest latency, no Transit Gateway costs
- ✗ Management complexity scales as O(n²), doesn't scale beyond ~10 VPCs
### Pattern 5: Hybrid (Multi-Pattern) Architecture
**Use When:** Large enterprise with diverse requirements, balancing cost/performance/security
**Characteristics:**
- Hub-spoke for most VPCs + direct peering for latency-sensitive pairs
- Combination based on workload requirements
**Tradeoffs:**
- ✓ Optimized for specific needs
- ✗ More complex to design and manage
## Pattern Selection Framework
```
Number of VPCs?
│
├─► 1 VPC → Flat (Single VPC)
├─► 2-4 VPCs + No inter-VPC communication → Multi-VPC (Isolated)
├─► 2-5 VPCs + Low latency critical → Full Mesh (VPC Peering)
├─► 5+ VPCs + Centralized inspection → Hub-and-Spoke (Transit Gateway)
└─► 10+ VPCs + Mixed requirements → Hybrid (Multi-Pattern)
Additional Considerations:
├─► Hybrid connectivity required? → Hub-and-Spoke preferred
├─► Centralized egress/inspection? → Hub-and-Spoke with Inspection VPC
├─► Multi-account environment? → Hub-and-Spoke with AWS RAM sharing
└─► Cost optimization priority? → Flat or Multi-VPC (avoid TGW fees)
```
## Subnet Strategy
### Standard Three-Tier Design
**Public Subnets:**
- Route to Internet Gateway
- Use for load balancers, bastion hosts, NAT Gateways
- CIDR: /24 to /27 (256 to 32 IPs)
**Private Subnets:**
- Route to NAT Gateway for outbound
- Use for application servers, containers, compute workloads
- CIDR: /20 to /22 (4,096 to 1,024 IPs)
**Database Subnets:**
- No direct internet route
- Use for RDS, ElastiCache, managed databases
- CIDR: /24 to /26 (256 to 64 IPs)
### Multi-AZ Distribution
**Production:** Distribute each tier across 3 Availability Zones minimum
**Dev/Test:** 1-2 AZs acceptable for cost savings
### CIDR Block Planning
**VPC Sizing:**
- /16 (65,536 IPs) - Large production environments
- /20 (4,096 IPs) - Medium environments
- /24 (256 IPs) - Small/dev environments
**Critical Rules:**
- Non-overlapping CIDR ranges across VPCs
- Coordinate with on-premises network team for hybrid connectivity
- Reserve address space for future expansion
For detailed subnet planning, see `references/subnet-strategy.md`
## NAT Gateway Strategy
### Decision Framework
```
Cost vs Resilience?
│
├─► Cost Priority (Dev/Test)
│ └─► Single NAT Gateway (~$32/month)
│ └─► Risk: Single point of failure
│
├─► Balanced (Most Production)
│ └─► One NAT Gateway per AZ (~$96/month for 3 AZs)
│ └─► Resilience: AZ failure doesn't break connectivity
│
└─► Maximum Resilience
└─► Multiple NAT Gateways per AZ + monitoring
└─► Critical workloads, SLA-dependent
Alternative: Centralized Egress Pattern
└─► Hub-and-Spoke: Single egress VPC with NAT
└─► Reduces NAT Gateway count, centralized logging
```
**No Outbound Internet Needed?**
- Skip NAT Gateway entirely (cost savings)
- Use VPC Endpoints for AWS service access
## Security Controls
### Security Groups (Recommended)
**Characteristics:**
- Stateful (return traffic auto-allowed)
- Instance-level control
- Allow rules only (implicit deny)
- Can reference other security groups
**Use For:**
- Service-to-service communication
- Instance-level security
- Most common use case
**Best Practices:**
- Use descriptive names (app-alb-sg, app-backend-sg)
- Reference other security groups instead of CIDR blocks
- Keep rules minimal and specific
### Network ACLs (Optional)
**Characteristics:**
- Stateless (must allow both request and response)
- Subnet-level control
- Allow and deny rules
- Processes rules in order (lowest number first)
**Use For:**
- Explicit deny rulesManage Linux systems covering systemd services, process management, filesystems, networking, performance tuning, and troubleshooting. Use when deploying applications, optimizing server performance, diagnosing production issues, or managing users and security on Linux servers.
Data pipelines, feature stores, and embedding generation for AI/ML systems. Use when building RAG pipelines, ML feature serving, or data transformations. Covers feature stores (Feast, Tecton), embedding pipelines, chunking strategies, orchestration (Dagster, Prefect, Airflow), dbt transformations, data versioning (LakeFS), and experiment tracking (MLflow, W&B).
Strategic guidance for designing modern data platforms, covering storage paradigms (data lake, warehouse, lakehouse), modeling approaches (dimensional, normalized, data vault, wide tables), data mesh principles, and medallion architecture patterns. Use when architecting data platforms, choosing between centralized vs decentralized patterns, selecting table formats (Iceberg, Delta Lake), or designing data governance frameworks.
Design comprehensive security architectures using defense-in-depth, zero trust principles, threat modeling (STRIDE, PASTA), and control frameworks (NIST CSF, CIS Controls, ISO 27001). Use when designing security for new systems, auditing existing architectures, or establishing security governance programs.
Assembles component outputs from AI Design Components skills into unified, production-ready component systems with validated token integration, proper import chains, and framework-specific scaffolding. Use as the capstone skill after running theming, layout, dashboard, data-viz, or feedback skills to wire components into working React/Next.js, Python, or Rust projects.
Builds AI chat interfaces and conversational UI with streaming responses, context management, and multi-modal support. Use when creating ChatGPT-style interfaces, AI assistants, code copilots, or conversational agents. Handles streaming text, token limits, regeneration, feedback loops, tool usage visualization, and AI-specific error patterns. Provides battle-tested components from leading AI products with accessibility and performance built in.
Constructs secure, efficient CI/CD pipelines with supply chain security (SLSA), monorepo optimization, caching strategies, and parallelization patterns for GitHub Actions, GitLab CI, and Argo Workflows. Use when setting up automated testing, building, or deployment workflows.
Build professional command-line interfaces in Python, Go, and Rust using modern frameworks like Typer, Cobra, and clap. Use when creating developer tools, automation scripts, or infrastructure management CLIs with robust argument parsing, interactive features, and multi-platform distribution.