multi-cloud-architecture
This skill provides a decision framework and service comparison tables for architecting applications across AWS, Azure, GCP, and OCI. Use it when designing multi-cloud strategies, migrating between providers, selecting optimal services for specific workloads, or building cloud-agnostic systems to avoid vendor lock-in and optimize costs.
git clone --depth 1 https://github.com/wshobson/agents /tmp/multi-cloud-architecture && cp -r /tmp/multi-cloud-architecture/plugins/cloud-infrastructure/skills/multi-cloud-architecture ~/.claude/skills/multi-cloud-architectureSKILL.md
# Multi-Cloud Architecture
Decision framework and patterns for architecting applications across AWS, Azure, GCP, and OCI.
## Purpose
Design cloud-agnostic architectures and make informed decisions about service selection across cloud providers.
## When to Use
- Design multi-cloud strategies
- Migrate between cloud providers
- Select cloud services for specific workloads
- Implement cloud-agnostic architectures
- Optimize costs across providers
## Cloud Service Comparison
### Compute Services
| AWS | Azure | GCP | OCI | Use Case |
| ------- | ------------------- | --------------- | ------------------- | ------------------ |
| EC2 | Virtual Machines | Compute Engine | Compute | IaaS VMs |
| ECS | Container Instances | Cloud Run | Container Instances | Containers |
| EKS | AKS | GKE | OKE | Kubernetes |
| Lambda | Functions | Cloud Functions | Functions | Serverless |
| Fargate | Container Apps | Cloud Run | Container Instances | Managed containers |
### Storage Services
| AWS | Azure | GCP | OCI | Use Case |
| ------- | --------------- | --------------- | -------------- | -------------- |
| S3 | Blob Storage | Cloud Storage | Object Storage | Object storage |
| EBS | Managed Disks | Persistent Disk | Block Volumes | Block storage |
| EFS | Azure Files | Filestore | File Storage | File storage |
| Glacier | Archive Storage | Archive Storage | Archive Storage | Cold storage |
### Database Services
| AWS | Azure | GCP | OCI | Use Case |
| ----------- | ---------------- | ------------- | ------------------- | --------------- |
| RDS | SQL Database | Cloud SQL | MySQL HeatWave | Managed SQL |
| DynamoDB | Cosmos DB | Firestore | NoSQL Database | NoSQL |
| Aurora | PostgreSQL/MySQL | Cloud Spanner | Autonomous Database | Distributed SQL |
| ElastiCache | Cache for Redis | Memorystore | OCI Cache | Caching |
**Reference:** See `references/service-comparison.md` for complete comparison
## Multi-Cloud Patterns
### Pattern 1: Single Provider with DR
- Primary workload in one cloud
- Disaster recovery in another
- Database replication across clouds
- Automated failover
### Pattern 2: Best-of-Breed
- Use best service from each provider
- AI/ML on GCP
- Enterprise apps on Azure
- Regulated data platforms on OCI
- General compute on AWS
### Pattern 3: Geographic Distribution
- Serve users from nearest cloud region
- Data sovereignty compliance
- Global load balancing
- Regional failover
### Pattern 4: Cloud-Agnostic Abstraction
- Kubernetes for compute
- PostgreSQL for database
- S3-compatible storage (MinIO)
- Open source tools
## Cloud-Agnostic Architecture
### Use Cloud-Native Alternatives
- **Compute:** Kubernetes (EKS/AKS/GKE/OKE)
- **Database:** PostgreSQL/MySQL (RDS/SQL Database/Cloud SQL/MySQL HeatWave)
- **Message Queue:** Apache Kafka or managed streaming (MSK/Event Hubs/Confluent/OCI Streaming)
- **Cache:** Redis (ElastiCache/Azure Cache/Memorystore/OCI Cache)
- **Object Storage:** S3-compatible API
- **Monitoring:** Prometheus/Grafana
- **Service Mesh:** Istio/Linkerd
### Abstraction Layers
```
Application Layer
↓
Infrastructure Abstraction (Terraform)
↓
Cloud Provider APIs
↓
AWS / Azure / GCP / OCI
```
## Cost Comparison
### Compute Pricing Factors
- **AWS:** On-demand, Reserved, Spot, Savings Plans
- **Azure:** Pay-as-you-go, Reserved, Spot
- **GCP:** On-demand, Committed use, Preemptible
- **OCI:** Pay-as-you-go, annual commitments, burstable/flexible shapes, preemptible instances
### Cost Optimization Strategies
1. Use reserved/committed capacity (30-70% savings)
2. Leverage spot/preemptible instances
3. Right-size resources
4. Use serverless for variable workloads
5. Optimize data transfer costs
6. Implement lifecycle policies
7. Use cost allocation tags
8. Monitor with cloud cost tools
**Reference:** See `references/multi-cloud-patterns.md`
## Migration Strategy
### Phase 1: Assessment
- Inventory current infrastructure
- Identify dependencies
- Assess cloud compatibility
- Estimate costs
### Phase 2: Pilot
- Select pilot workload
- Implement in target cloud
- Test thoroughly
- Document learnings
### Phase 3: Migration
- Migrate workloads incrementally
- Maintain dual-run period
- Monitor performance
- Validate functionality
### Phase 4: Optimization
- Right-size resources
- Implement cloud-native services
- Optimize costs
- Enhance security
## Best Practices
1. **Use infrastructure as code** (Terraform/OpenTofu)
2. **Implement CI/CD pipelines** for deployments
3. **Design for failure** across clouds
4. **Use managed services** when possible
5. **Implement comprehensive monitoring**
6. **Automate cost optimization**
7. **Follow security best practices**
8. **Document cloud-specific configurations**
9. **Test disaster recovery** procedures
10. **Train teams** on multiple clouds
## Related Skills
- `terraform-module-library` - For IaC implementation
- `cost-optimization` - For cost management
- `hybrid-cloud-networking` - For connectivityTest web applications with screen readers including VoiceOver, NVDA, and JAWS. Use when validating screen reader compatibility, debugging accessibility issues, or ensuring assistive technology support.
Conduct WCAG 2.2 accessibility audits with automated testing, manual verification, and remediation guidance. Use when auditing websites for accessibility, fixing WCAG violations, or implementing accessible design patterns.
Coordinate parallel code reviews across multiple quality dimensions with finding deduplication, severity calibration, and consolidated reporting. Use this skill when organizing multi-reviewer code reviews, calibrating finding severity, or consolidating review results.
Debug complex issues using competing hypotheses with parallel investigation, evidence collection, and root cause arbitration. Use this skill when debugging bugs with multiple potential causes, performing root cause analysis, or organizing parallel investigation workflows.
Coordinate parallel feature development with file ownership strategies, conflict avoidance rules, and integration patterns for multi-agent implementation. Use this skill when decomposing a large feature into independent work streams, when two or more agents need to implement different layers of the same system simultaneously, when establishing file ownership to prevent merge conflicts in a shared codebase, when designing interface contracts so parallel implementers can build against each other's APIs before they are ready, or when deciding whether to use vertical slices versus horizontal layers for a full-stack feature.
Decompose complex tasks, design dependency graphs, and coordinate multi-agent work with proper task descriptions and workload balancing. Use this skill when breaking down work for agent teams, managing task dependencies, or monitoring team progress.
Structured messaging protocols for agent team communication including message type selection, plan approval, shutdown procedures, and anti-patterns to avoid. Use this skill when establishing communication norms for a newly spawned team, when deciding whether to send a direct message or a broadcast, when a team-lead needs to review and approve an implementer's plan before work begins, when orchestrating a graceful team shutdown after all tasks are complete, or when debugging why teammates are not coordinating correctly at integration points.
Design optimal agent team compositions with sizing heuristics, preset configurations, and agent type selection. Use this skill when deciding how many agents to spawn for a task, when choosing between a review team versus a feature team versus a debug team, when selecting the correct subagent_type for each role to ensure agents have the tools they need, when configuring display modes (tmux, iTerm2, in-process) for a CI or local environment, or when building a custom team composition for a non-standard workflow such as a migration or security audit.