devops-deploy
Designs and executes CI/CD pipelines, GitOps workflows, deployment automation, and cloud infrastructure deployment including Docker, AWS Lambda, SAM, Terraform, and GitHub Actions. Use when building or improving CI/CD pipelines, containerizing applications, creating deployment runbooks, or deploying to cloud infrastructure.
git clone --depth 1 https://github.com/tranhieutt/software_development_department /tmp/devops-deploy && cp -r /tmp/devops-deploy/.claude/skills/devops-deploy ~/.claude/skills/devops-deploySKILL.md
# DevOps Deploy
## Production checklist (always verify)
- [ ] Env vars via Secrets Manager — never hardcoded
- [ ] Health check endpoint responding
- [ ] Structured JSON logs with `request_id`
- [ ] Rate limiting configured
- [ ] CORS restricted to authorized domains
- [ ] Lambda timeout appropriate (10–30s)
- [ ] CloudWatch alarms for errors and latency
- [ ] Rollback plan documented
- [ ] Load test before launch
## Docker: multi-stage Python
```dockerfile
FROM python:3.11-slim AS builder
WORKDIR /app
COPY requirements.txt .
RUN pip install --no-cache-dir --user -r requirements.txt
FROM python:3.11-slim
WORKDIR /app
COPY --from=builder /root/.local /root/.local
COPY . .
ENV PATH=/root/.local/bin:$PATH
ENV PYTHONUNBUFFERED=1
EXPOSE 8000
HEALTHCHECK --interval=30s --timeout=3s CMD curl -f http://localhost:8000/health || exit 1
CMD ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "8000"]
```
## Docker Compose (local dev)
```yaml
services:
app:
build: .
ports: ["8000:8000"]
environment:
- ANTHROPIC_API_KEY=${ANTHROPIC_API_KEY}
volumes:
- .:/app
depends_on: [db, redis]
db:
image: postgres:15
environment:
POSTGRES_DB: app
POSTGRES_USER: app
POSTGRES_PASSWORD: ${DB_PASSWORD}
volumes:
- pgdata:/var/lib/postgresql/data
redis:
image: redis:7-alpine
volumes:
pgdata:
```
## SAM template (Lambda + DynamoDB)
```yaml
AWSTemplateFormatVersion: '2010-09-09'
Transform: AWS::Serverless-2016-10-31
Globals:
Function:
Timeout: 30
Runtime: python3.11
Environment:
Variables:
DYNAMODB_TABLE: !Ref AppTable
Resources:
AppFunction:
Type: AWS::Serverless::Function
Properties:
CodeUri: src/
Handler: lambda_function.handler
MemorySize: 512
Policies:
- DynamoDBCrudPolicy:
TableName: !Ref AppTable
AppTable:
Type: AWS::DynamoDB::Table
Properties:
BillingMode: PAY_PER_REQUEST
AttributeDefinitions:
- AttributeName: userId
AttributeType: S
KeySchema:
- AttributeName: userId
KeyType: HASH
TimeToLiveSpecification:
AttributeName: ttl
Enabled: true
```
```bash
# SAM commands
sam build
sam deploy --guided # first time (creates samconfig.toml)
sam deploy # subsequent
sam deploy --no-confirm-changeset --no-fail-on-empty-changeset
sam logs -n AppFunction --tail
```
## GitHub Actions: test + security + deploy
```yaml
name: Deploy
on:
push:
branches: [main]
jobs:
test:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions/setup-python@v5
with: { python-version: "3.11" }
- run: pip install -r requirements.txt
- run: pytest tests/ -v --cov=src --cov-report=xml
security:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- run: pip install bandit safety
- run: bandit -r src/ -ll
- run: safety check -r requirements.txt
deploy:
needs: [test, security]
if: github.ref == 'refs/heads/main'
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: aws-actions/setup-sam@v2
- uses: aws-actions/configure-aws-credentials@v4
with:
aws-access-key-id: ${{ secrets.AWS_ACCESS_KEY_ID }}
aws-secret-access-key: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
aws-region: us-east-1
- run: sam build
- run: sam deploy --no-confirm-changeset
```
## Health check endpoint (FastAPI)
```python
import time, os
from fastapi import FastAPI
app = FastAPI()
START_TIME = time.time()
@app.get("/health")
async def health():
return {
"status": "healthy",
"uptime_seconds": time.time() - START_TIME,
"version": os.environ.get("APP_VERSION", "unknown"),
}
```
## Pipeline Design
### Standard Pipeline Stages
```
[Build] -> [Test] -> [Security Scan] -> [Package] -> [Deploy Staging] -> [Integration Test] -> [Approval] -> [Deploy Prod] -> [Verify]
```
| Stage | Actions | Failure Policy |
|-------|---------|----------------|
| Build | Compile, lint, type-check | Block |
| Test | Unit + integration tests | Block |
| Security | SAST, dependency scan, container scan | Block on Critical/High |
| Package | Docker build, push to registry, sign image | Block |
| Deploy Staging | Apply manifests/Helm, run smoke tests | Block |
| Approval | Manual gate for production | Require approval |
| Deploy Prod | Progressive rollout | Auto-rollback on failure |
| Verify | Health checks, metrics validation | Auto-rollback |
### Deployment Strategy Selection
| Strategy | Zero-downtime | Rollback Speed | Resource Cost | Use When |
|----------|---------------|----------------|---------------|----------|
| Rolling Update | Yes | Slow (redeploy) | Low | Default for most services |
| Blue/Green | Yes | Instant (switch) | 2x | Critical services, DB-independent |
| Canary | Yes | Fast (shift) | 1.1x | High-traffic, need real-user validation |
### GitOps Repository Structure
```
app-repo/ # Application source code + Dockerfile
env-repo/ # Environment configs
base/ # Base manifests
overlays/
dev/
staging/
prod/
```
Tools: ArgoCD or Flux v2 · Kustomize or Helm · External Secrets Operator
### Security Scanning in Pipeline
- SAST: CodeQL, Semgrep, SonarQube
- Dependency: Snyk, Dependabot, npm audit
- Container: Trivy, Grype
- Secrets: GitLeaks, TruffleHog
- SBOM: Syft · Image signing: Cosign
### DORA Metrics to Track
- Deployment frequency
- Lead time for changes
- Change failure rate
- Mean time to recovery (MTTR)
---
## Deployment Runbook Principles
### Platform Selection
```
What are you deploying?
├── Static site → Vercel, Netlify, Cloudflare Pages
├── Simple web app → Railway, Render, Fly.io / VPS + PM2
├── Microservices → Container orchestration
└── Serverless → Edge functions, Lambda
```
| Platform | Deployment Method | RolThe Accessibility Specialist ensures the software is accessible to the widest possible audience. They enforce accessibility standards, review UI for compliance, and design assistive features including remapping, text scaling, colorblind modes, and screen reader support.
The AI Programmer implements intelligent system features: recommendation engines, classification pipelines, LLM integrations, decision logic, and autonomous agent behavior. Use this agent for AI/ML feature implementation, model integration, intelligent automation, or AI system debugging.
The Analytics Engineer designs telemetry systems, user behavior tracking, A/B test frameworks, and data analysis pipelines. Use this agent for event tracking design, dashboard specification, A/B test design, or user behavior analysis methodology.
The Backend Developer builds and maintains server-side logic, APIs, databases, authentication, and integrations. Use this agent for REST/GraphQL API implementation, database operations, authentication systems, background jobs, microservices, server performance, and backend testing. Works from API design contracts and PRDs.
The Community Manager handles user-facing communications, feedback synthesis, support escalation, and community engagement. Use this agent for drafting release announcements, synthesizing user feedback into actionable insights, writing support documentation, or coordinating community-facing communication around releases and incidents.
The CTO (Chief Technical Officer) owns the high-level technical vision, architecture decisions, technology choices, and technical strategy. Use this agent for architecture-level decisions, technology evaluations, cross-system conflicts, and when a technical choice will constrain or enable product possibilities. This is the highest technical authority in the department.
The Data Engineer designs database schemas, builds data pipelines, manages migrations, and owns the data infrastructure. Use this agent for schema design, complex migrations, data modeling, ETL/ELT pipelines, database performance optimization, analytics infrastructure, and data integrity strategies.
The DevOps Engineer maintains build pipelines, CI/CD configuration, version control workflow, and deployment infrastructure. Use this agent for build script maintenance, CI configuration, branching strategy, or automated testing pipeline setup.