Skill586 repo starsupdated 1mo ago

nw-cicd-and-deployment

This Claude Code skill provides CI/CD pipeline design methodology covering local quality gates, multi-stage pipeline architecture, and deployment strategies. Use it when designing new CI/CD workflows, establishing GitHub Actions patterns, defining branch strategies, or implementing deployment gates to ensure fast feedback loops and production safety.

View source Repository: nWave

Install in Claude Code

Copy

git clone --depth 1 https://github.com/nWave-ai/nWave /tmp/nw-cicd-and-deployment && cp -r /tmp/nw-cicd-and-deployment/nWave/skills/nw-cicd-and-deployment ~/.claude/skills/nw-cicd-and-deployment

Then start a new Claude Code session; the skill loads automatically.

Definition

SKILL.md

# CI/CD Pipeline Design and Deployment Strategies

## Local Quality Gates

Catch issues at the developer's machine before they reach CI. Local gates mirror the remote commit stage for fast feedback (seconds vs minutes).

### Gate Taxonomy

| Gate | Trigger | Checks | Tools |
|------|---------|--------|-------|
| Pre-commit | `git commit` | Formatting, linting, unit tests, secrets scan | pre-commit, husky, lefthook |
| Pre-push | `git push` | Integration tests, acceptance tests, coverage threshold | pre-commit (push stage), git hooks |
| Local CI | Manual | Full pipeline locally | act (GitHub Actions), gitlab-runner exec |

### Design Principles

- **Mirror, not duplicate**: local gates run the same checks as the remote commit stage, not additional ones. Keeps developer experience consistent with CI.
- **Fast by default**: pre-commit gates target < 30 seconds. Move slow checks (integration, acceptance) to pre-push.
- **Escapable with audit trail**: allow `--no-verify` for emergencies but log skips. CI remains the authoritative gate.
- **Framework selection**: prefer `pre-commit` (Python ecosystem) or `lefthook` (polyglot, fast parallel execution) over raw git hooks. Husky for JS/TS-heavy projects.

### Hook Stage Assignment

```
pre-commit (< 30s):     formatting | linting | unit tests (fast subset) | secrets scan
pre-push   (< 5 min):   full unit suite | integration tests | coverage check | type checking
```

## Pipeline Stages

### Commit Stage (target: < 10 minutes)
Compile/build | Run unit tests (fast, isolated) | Static code analysis (linting, formatting) | Security scanning (SAST, secrets detection) | Generate build artifacts.
Quality gates: build success | 100% unit test pass rate | coverage threshold (e.g., > 80%) | no critical vulnerabilities | no secrets in code.

### Acceptance Stage (target: < 30 minutes)
Deploy to test environment | Run acceptance/integration/contract tests | Security scanning (DAST).
Quality gates: 100% acceptance/integration pass rate | no high/critical security findings | API contracts validated.

### Capacity Stage (target: < 60 minutes, can run parallel)
Performance, load, and stress testing | Chaos engineering experiments.
Quality gates: performance within SLO thresholds | load test pass (expected traffic + margin) | resilience under failure.

### Production Stage
Progressive deployment (canary/blue-green) | Health checks and smoke tests | SLO monitoring during rollout | Automatic rollback on degradation.
Quality gates: health checks pass | SLOs maintained | no error rate increase | latency within bounds.

## Quality Gate Classification

Every quality gate has a category (where it runs), a type (what happens on failure), and a scope (what it protects).

### Gate Taxonomy

| Category | Stage | Type | Examples |
|----------|-------|------|----------|
| Local | Pre-commit, pre-push | Blocking (developer) | Format, lint, unit tests, secrets scan |
| PR | Pull request | Blocking (merge) | Status checks, review approvals, coverage diff |
| CI | Commit stage | Blocking (pipeline) | Build, unit tests, SAST, coverage threshold |
| CI | Acceptance stage | Blocking (pipeline) | Integration, acceptance, contract tests, DAST |
| Deploy | Environment promotion | Blocking (approval) | Manual approval, change advisory board |
| Deploy | Canary/progressive | Automatic (rollback) | Error rate, latency, SLO breach |
| Production | Post-deploy | Advisory (monitoring) | Smoke tests, SLO monitoring window, business metrics |

### Gate Types

- **Blocking**: pipeline halts on failure. Merge/deploy/promotion is prevented until resolved.
- **Automatic (rollback)**: no human intervention -- system rolls back on threshold breach. Requires pre-defined thresholds and rollback automation.
- **Advisory**: failure is reported but does not block. Used for post-deploy monitoring where rollback is a separate decision.

### Design Checklist

When designing quality gates for a pipeline, verify:
1. Every remote CI gate has a local equivalent (pre-commit or pre-push)
2. PR gates include both automated checks (status checks) and human review (approvals)
3. Deployment gates distinguish blocking (promotion) from automatic (canary rollback)
4. Post-deploy gates have clear escalation paths (advisory -> manual rollback decision)
5. Gate thresholds are documented and versioned (not hardcoded in pipeline YAML)

## GitHub Actions Patterns

### Workflow Structure
Triggers: push to main/develop | pull_request | release tags | manual workflow_dispatch.
Jobs flow: build -> security -> deploy_staging -> deploy_production. Each with appropriate `needs` dependencies and environment gates.

### Quality Gate Pattern
```yaml
- name: Quality Gate
  run: |
    COVERAGE=$(jq '.totals.percent_covered' coverage.json)
    if (( $(echo "$COVERAGE < 80" | bc -l) )); then
      echo "Coverage $COVERAGE% is below 80% threshold"
      exit 1
    fi
```

### Caching Pattern
```yaml
- uses: actions/cache@v4
  with:
    path: ~/.cache/pip
    key: ${{ runner.os }}-pip-${{ hashFiles('**/requirements.txt') }}
    restore-keys: |
      ${{ runner.os }}-pip-
```

### Matrix Testing Pattern
```yaml
strategy:
  matrix:
    python-version: ['3.10', '3.11', '3.12']
    os: [ubuntu-latest, macos-latest]
```

## Deployment Strategies

### Rolling Deployment
Gradual replacement of instances. Kubernetes config: `type: RollingUpdate`, `maxSurge: 25%`, `maxUnavailable: 0`.
- Pros: zero downtime | simple | efficient resources
- Cons: slow rollback | mixed versions during deployment
- Use when: stateless services | no breaking API changes | low-risk changes

### Blue-Green Deployment
Two identical environments, instant switch: Blue (current) serves traffic -> Deploy new to Green -> Smoke tests on Green -> Switch load balancer -> Blue becomes standby/rollback.
- Pros: instant rollback | easy pre-switch testing | clean version separation
- Cons: requires 2x resources | database migrations need care
- Use when: instant rollback needed | crit