Skill1.2k repo starsupdated today

cicd-playbook

# CI/CD Playbook Skill Generates a comprehensive, structured CI/CD playbook document for a service or team covering pipeline stages, environment definitions, deployment gates, rollback procedures, and on-call responsibilities. Use this skill when documenting an existing CI/CD pipeline, defining a new deployment process, establishing release gates, or creating an operational guide for engineers to safely build, test, and deploy services.

View source Repository: pm-claude-skills

Install in Claude Code

Copy

git clone --depth 1 https://github.com/mohitagw15856/pm-claude-skills /tmp/cicd-playbook && cp -r /tmp/cicd-playbook/plugins/pm-engineering/skills/cicd-playbook ~/.claude/skills/cicd-playbook

Then start a new Claude Code session; the skill loads automatically.

Definition

SKILL.md

# CI/CD Playbook Skill

Produce a complete, actionable CI/CD playbook for a service or team — covering everything a new engineer needs to understand, contribute to, and operate the pipeline safely.

A good playbook is not a diagram. It is a document that answers: what runs, when, why, who owns it, and what to do when it breaks.

## Required Inputs

Ask for these if not already provided:
- **Service name** and brief description
- **Tech stack** — language, framework, containerisation (Docker, etc.)
- **Source control** — GitHub / GitLab / Bitbucket, branching strategy
- **CI platform** — GitHub Actions / CircleCI / Jenkins / BuildKite / other
- **CD platform / deployment target** — Kubernetes, ECS, Lambda, Heroku, VMs, etc.
- **Environments** — e.g. dev, staging, production (and any canary / feature environments)
- **Deployment frequency** — how often does the team ship?
- **Any existing gates** — manual approvals, smoke tests, feature flags
- **On-call setup** — who's responsible during deploys?

## Output Format

---

# CI/CD Playbook: [Service Name]

**Service:** [Name] | **Team:** [Team name]
**Last updated:** [Date] | **Owner:** [Name / role]
**Pipeline platform:** [CI tool] → [CD tool / platform]

---

## Overview

[2–3 sentences describing what this service does and why the CI/CD pipeline is structured the way it is. Include the deployment target and how frequently the team ships.]

**Deployment frequency:** [Multiple times per day / Daily / Weekly / On-demand]
**Average pipeline duration:** [X minutes]
**Rollback time (p95):** [X minutes]

---

## Pipeline Stages

```
[Branch push]
    │
    ▼
[1. Build & Lint] ──fail──▶ ❌ Block PR
    │
    ▼
[2. Unit Tests] ──fail──▶ ❌ Block PR
    │
    ▼
[3. Integration Tests] ──fail──▶ ❌ Block PR
    │
    ▼
[4. Security Scan] ──fail──▶ ⚠️ [Block / Warn — specify]
    │
    ▼
[5. Build Artefact / Container Image]
    │
    ▼
[6. Deploy to Staging] ──fail──▶ ❌ Block promotion
    │
    ▼
[7. Smoke Tests (Staging)]
    │
    ▼
[8. Manual Approval Gate] ──(if required)
    │
    ▼
[9. Deploy to Production] ──fail──▶ 🔁 Auto-rollback (if configured)
    │
    ▼
[10. Post-deploy checks]
```

---

## Stage Definitions

### Stage 1 — Build & Lint

**What runs:** [Build command] + [Linter — e.g. ESLint, golangci-lint, flake8]
**Trigger:** Every commit to any branch
**Blocking:** Yes — PR cannot be merged if this fails
**Typical duration:** [X minutes]
**Owner if it fails:** PR author

**Common failure causes:**
- [e.g. Missing dependency — run `npm install` locally before pushing]
- [e.g. Lint rule violation — run `npm run lint --fix` to auto-fix most issues]

---

### Stage 2 — Unit Tests

**What runs:** [Test command — e.g. `npm test`, `go test ./...`, `pytest`]
**Coverage gate:** [X]% minimum — pipeline fails below this threshold
**Trigger:** Every commit
**Blocking:** Yes
**Typical duration:** [X minutes]

**Coverage report:** [Where to find it — e.g. uploaded to Codecov, available in CI artifacts]

---

### Stage 3 — Integration Tests

**What runs:** [Test suite description — e.g. "API integration tests against a test database using Docker Compose"]
**Environment:** [Ephemeral test environment / shared test DB / etc.]
**Trigger:** Every commit to `main` and feature branches targeting `main`
**Blocking:** Yes
**Typical duration:** [X minutes]

**If slow:** [e.g. "Integration tests can be skipped locally with `SKIP_INTEGRATION=true` — never skip in CI"]

---

### Stage 4 — Security Scan

**Tools:** [e.g. Snyk, Trivy, OWASP Dependency Check, Semgrep]
**What it checks:** [Dependency vulnerabilities / SAST / secrets detection — list what applies]
**Blocking on:** Critical and High severity findings
**Non-blocking on:** Medium and Low (flagged, not blocking)
**Trigger:** Every commit to `main`

**How to handle a flagged vulnerability:**
1. Check if a fix is available — upgrade the dependency
2. If no fix available, open a security ticket and add a suppression with justification
3. Never suppress without a ticket and owner

---

### Stage 5 — Build Artefact

**What is produced:** [Docker image / binary / zip — be specific]
**Registry:** [ECR / GCR / Docker Hub / Artifactory — URL]
**Tagging convention:** `[service-name]:[git-sha]` (also tagged `:latest` on `main`)
**Trigger:** Commits to `main` only (not feature branches)

---

### Stage 6 — Deploy to Staging

**Deployment method:** [e.g. Helm upgrade / kubectl apply / ecs deploy / Terraform apply]
**Staging URL:** [URL]
**Trigger:** Automatic on successful artefact build from `main`
**Who can deploy to staging:** Any engineer (automatic)

**Environment variables:** Managed in [Vault / AWS SSM / GitHub Secrets / etc.]
**Staging is not production:** [Any differences in config, scale, or data — state them here]

---

### Stage 7 — Smoke Tests (Staging)

**What runs:** [Description — e.g. "10 critical path tests covering login, core API endpoints, and payment flow"]
**Tool:** [e.g. Playwright / Postman / custom script]
**Pass criteria:** All smoke tests pass within [X seconds] timeout
**Blocking:** Yes — production deploy will not proceed if smoke tests fail

**Smoke test suite location:** [Link to test files or folder]

---

### Stage 8 — Manual Approval Gate

**Required for:** [Production deploys / deploys affecting >X% of traffic / deploys to specific regions]
**Who can approve:** [e.g. Any engineer on the team / Lead engineer / On-call engineer]
**Approval timeout:** [e.g. 24 hours — auto-cancelled if no approval]
**How to approve:** [GitHub Actions approve step / Slack command / other — with link]

**When to withhold approval:**
- Active incident in production
- Deploy is outside the deployment window (see below)
- On-call engineer has not been notified

---

### Stage 9 — Deploy to Production

**Deployment method:** [Same as staging or different — specify]
**Deployment window:** [e.g. Monday–Thursday 09:00–16:00 UTC — no deploys on Fridays or before bank holidays]
**Canary / progressive rollout:**