Skip to main content
ClaudeWave
Skill1.4k estrellas del repoactualizado 4d ago

devops-infrastructure

The devops-infrastructure skill provides guidance on containerization, CI/CD pipeline configuration, deployment strategies, infrastructure as code tools, and observability setup. Use this skill when writing Dockerfiles, configuring GitHub Actions workflows, planning deployment architectures, setting up monitoring systems, or answering questions about containers, Terraform, Kubernetes, production infrastructure, or related DevOps concerns.

Instalar en Claude Code
Copiar
git clone --depth 1 https://github.com/CloudAI-X/claude-workflow-v2 /tmp/devops-infrastructure && cp -r /tmp/devops-infrastructure/skills/devops-infrastructure ~/.claude/skills/devops-infrastructure
Después abre una sesión nueva de Claude Code; el skill carga automáticamente.

SKILL.md

# DevOps & Infrastructure

### When to Load

- **Trigger**: Docker, CI/CD pipelines, deployment configuration, monitoring, infrastructure as code
- **Skip**: Application logic only with no infrastructure or deployment concerns

## DevOps Workflow

Copy this checklist and track progress:

```
DevOps Setup Progress:
- [ ] Step 1: Containerize application (Dockerfile)
- [ ] Step 2: Set up CI/CD pipeline
- [ ] Step 3: Define deployment strategy
- [ ] Step 4: Configure monitoring & alerting
- [ ] Step 5: Set up environment management
- [ ] Step 6: Document runbooks
- [ ] Step 7: Validate against anti-patterns checklist
```

## Docker Best Practices

### Multi-Stage Build

```dockerfile
# WRONG: Single stage, bloated image
FROM node:20
WORKDIR /app
COPY . .
RUN npm install
RUN npm run build
CMD ["node", "dist/index.js"]
# Result: 1.2GB image with devDependencies and source code

# CORRECT: Multi-stage build
FROM node:20-alpine AS builder
WORKDIR /app
COPY package*.json ./
RUN npm ci
COPY . .
RUN npm run build

FROM node:20-alpine AS runner
WORKDIR /app
ENV NODE_ENV=production
RUN addgroup -g 1001 appgroup && adduser -u 1001 -G appgroup -s /bin/sh -D appuser
COPY --from=builder /app/dist ./dist
COPY --from=builder /app/node_modules ./node_modules
COPY --from=builder /app/package.json ./
USER appuser
EXPOSE 3000
CMD ["node", "dist/index.js"]
# Result: ~150MB image, no devDependencies, non-root user
```

### Python Multi-Stage

```dockerfile
FROM python:3.12-slim AS builder
WORKDIR /app
RUN pip install uv
COPY pyproject.toml uv.lock ./
RUN uv sync --frozen --no-dev
COPY . .

FROM python:3.12-slim AS runner
WORKDIR /app
RUN useradd -r -s /bin/false appuser
COPY --from=builder /app/.venv /app/.venv
COPY --from=builder /app/src ./src
ENV PATH="/app/.venv/bin:$PATH"
USER appuser
CMD ["python", "-m", "src.main"]
```

### Layer Caching

```dockerfile
# WRONG: Cache busted on every code change
COPY . .
RUN npm ci

# CORRECT: Dependencies cached separately
COPY package*.json ./
RUN npm ci                  # cached unless package.json changes
COPY . .                    # only source code changes bust this layer
```

### .dockerignore

```
node_modules
.git
.env
*.md
.vscode
coverage
dist
__pycache__
.pytest_cache
*.pyc
```

### Security

```dockerfile
# Always pin versions
FROM node:20.11.0-alpine   # NOT node:latest

# Don't run as root
USER appuser

# Read-only filesystem where possible
# docker run --read-only --tmpfs /tmp myapp

# Scan images
# docker scout cves myimage:latest
# trivy image myimage:latest
```

## CI/CD Pipeline Design

### GitHub Actions Structure

```yaml
name: CI/CD
on:
  push:
    branches: [main]
  pull_request:
    branches: [main]

jobs:
  lint:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-node@v4
        with:
          node-version: 20
          cache: "npm"
      - run: npm ci
      - run: npm run lint

  test:
    runs-on: ubuntu-latest
    needs: lint
    services:
      postgres:
        image: postgres:16
        env:
          POSTGRES_DB: testdb
        ports: ["5432:5432"]
        options: >-
          --health-cmd pg_isready
          --health-interval 10s
          --health-timeout 5s
          --health-retries 5
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-node@v4
        with:
          node-version: 20
          cache: "npm"
      - run: npm ci
      - run: npm test

  build:
    runs-on: ubuntu-latest
    needs: test
    steps:
      - uses: actions/checkout@v4
      - uses: docker/setup-buildx-action@v3
      - uses: docker/build-push-action@v5
        with:
          push: ${{ github.event_name == 'push' }}
          tags: ghcr.io/${{ github.repository }}:${{ github.sha }}
          cache-from: type=gha
          cache-to: type=gha,mode=max

  deploy:
    runs-on: ubuntu-latest
    needs: build
    if: github.ref == 'refs/heads/main'
    environment: production
    steps:
      - run: echo "Deploy to production"
```

### Caching Strategies

```yaml
# Node modules
- uses: actions/setup-node@v4
  with:
    cache: "npm"

# Python with uv
- name: Cache uv
  uses: actions/cache@v4
  with:
    path: ~/.cache/uv
    key: uv-${{ runner.os }}-${{ hashFiles('uv.lock') }}

# Docker layer caching
- uses: docker/build-push-action@v5
  with:
    cache-from: type=gha
    cache-to: type=gha,mode=max
```

## Deployment Strategies

### Blue-Green Deployment

```
1. Run two identical environments: Blue (live) and Green (idle)
2. Deploy new version to Green
3. Run smoke tests on Green
4. Switch load balancer to Green
5. Green is now live, Blue is idle
6. Rollback: switch back to Blue

Pros: Instant rollback, zero downtime
Cons: 2x infrastructure cost during deploy
```

### Canary Deployment

```
1. Deploy new version to small subset (5% of traffic)
2. Monitor error rates and latency
3. Gradually increase: 5% -> 25% -> 50% -> 100%
4. Rollback: route all traffic back to old version

Pros: Limited blast radius, real-world testing
Cons: More complex routing, longer rollout
```

### Rolling Deployment

```
1. Replace instances one at a time
2. Each new instance passes health checks before next starts
3. Continue until all instances updated

Pros: No extra infrastructure, gradual rollout
Cons: Mixed versions during deploy, slower rollback
```

### Feature Flags

```typescript
// Simple feature flag implementation
const features = {
  NEW_CHECKOUT: process.env.FF_NEW_CHECKOUT === "true",
  DARK_MODE: process.env.FF_DARK_MODE === "true",
};

function getCheckoutFlow(user: User) {
  if (features.NEW_CHECKOUT && user.betaGroup) {
    return newCheckoutFlow(user);
  }
  return legacyCheckoutFlow(user);
}

// Use a proper service for production: LaunchDarkly, Unleash, Flagsmith
```

## Infrastructure as Code

### Terraform Basics

```hcl
# main.tf
terraform {
  required_version = ">= 1.5"
  backend "s3" {
    bucket = "myapp-terraform-state"
    key    = "prod/terraform.tfstate"
    region = "us-east-1"
  }
}
code-reviewerSubagent

Expert code review specialist. Use PROACTIVELY after writing or modifying code, before commits, when asked to review changes, PR review, code quality check, lint, or standards audit. Focuses on quality, security, performance, and maintainability.

debuggerSubagent

Expert debugging specialist for errors, test failures, crashes, segmentation faults, memory leaks, timeouts, race conditions, deadlocks, and unexpected behavior. Use PROACTIVELY when encountering any error, exception, or failing test. Performs systematic root cause analysis.

docs-writerSubagent

Technical documentation specialist. Use for creating README files, API documentation, architecture docs, inline comments, user guides, changelogs, migration guides, release notes, FAQs, and troubleshooting docs. MUST BE USED when documentation is needed or when code changes require doc updates.

orchestratorSubagent

Master coordinator for complex multi-step tasks. Use PROACTIVELY when a task involves 2+ modules, requires delegation to specialists, needs architectural planning, or involves GitHub PR workflows. MUST BE USED for open-ended requests like "improve", "enhance", "build", "scale", "refactor", "add feature", "system design", "architecture", "complex task", or when implementing features from GitHub issues.

refactorerSubagent

Code refactoring specialist for improving code quality, reducing technical debt, eliminating code smells, reducing complexity, and applying design patterns. Use PROACTIVELY when code needs restructuring, simplification, tech debt reduction, or when applying DRY/SOLID principles.

security-auditorSubagent

Security specialist for vulnerability detection, secure coding review, and security hardening. Use PROACTIVELY when handling authentication, authorization, encryption, secrets, credentials, OAuth, JWT, CORS, headers, user input, API keys, or sensitive data. Checks for OWASP Top 10 and common vulnerabilities.

test-architectSubagent

Testing strategy specialist for designing test suites, writing tests, and ensuring comprehensive coverage. Use PROACTIVELY when adding new features, fixing bugs, improving test coverage, creating test plans, mocking strategies, handling flaky tests, or writing integration/E2E tests.

add-testsSlash Command

Add tests for recently changed files or specified code