load-testing-plan
# load-testing-plan This Claude Code skill generates a complete load and performance testing plan document for a service, including test objectives, scenario definitions, performance thresholds, k6 or Locust script templates, and CI integration steps. Use it when asked to create a performance test plan, document load testing procedures, define stress or soak test scenarios, or establish performance regression gates for continuous integration pipelines.
git clone --depth 1 https://github.com/mohitagw15856/pm-claude-skills /tmp/load-testing-plan && cp -r /tmp/load-testing-plan/plugins/pm-engineering/skills/load-testing-plan ~/.claude/skills/load-testing-planSKILL.md
# Load Testing Plan Skill Produce a complete load and performance testing plan for a service — covering test objectives, scenario definitions, tooling configuration, success thresholds, and CI integration. A good load testing plan eliminates ambiguity about what "performance is acceptable" means, so engineers can run tests and get a pass/fail answer without having to interpret raw numbers themselves. ## Required Inputs Ask for these if not already provided: - **Service name and key endpoints** — which endpoints are under test (path, method, typical request/response shape) - **Current traffic baseline** — current requests/sec, p50/p99 latency, error rate under normal load - **Peak traffic expectations** — expected peak RPS (e.g. 10× baseline for flash sales, or seasonality peak) - **SLO targets** — latency SLOs (p99 < X ms), error rate SLO (< Y%), availability target - **Preferred testing tool** — k6, Locust, JMeter, Gatling, or no preference - **Test environment availability** — dedicated load test environment, staging, or production (with traffic shaping) ## Output Format --- # Load Testing Plan: [Service Name] **Author:** [Name] | **Team:** [Team name] **Date:** [Date] | **Review cycle:** Before each major release and quarterly **Testing tool:** [k6 / Locust / JMeter / Gatling] **Test environment:** [Environment name and URL] --- ## 1. Objectives and Scope **What we are testing:** [Service name] handles [describe function — e.g. "user authentication requests from the mobile and web clients"]. This plan validates that the service meets its SLOs under expected and elevated traffic conditions. **In scope:** - [Endpoint 1: METHOD /path — description] - [Endpoint 2: METHOD /path — description] - [Endpoint 3: METHOD /path — description] **Out of scope:** - [Any endpoints explicitly excluded and why — e.g. "admin APIs — low traffic, excluded from load test"] - [Third-party integrations that cannot be load-tested — mock them instead] --- ## 2. Performance Targets (Success Criteria) Every scenario has explicit pass/fail thresholds. A test run FAILS if any threshold is breached. | Metric | Baseline scenario | Stress scenario | Spike scenario | Soak scenario | |---|---|---|---|---| | p50 latency | < [X] ms | < [X × 1.5] ms | < [X × 2] ms | < [X] ms | | p95 latency | < [Y] ms | < [Y × 1.5] ms | < [Y × 2] ms | < [Y] ms | | p99 latency | < [Z] ms | < [Z × 2] ms | < [Z × 3] ms | < [Z] ms | | Error rate | < [0.1]% | < [1]% | < [2]% | < [0.1]% | | Throughput | ≥ [N] RPS | ≥ [N × 3] RPS | N/A | ≥ [N] RPS | | Failed requests | 0 (5xx) | < [threshold] | < [threshold] | 0 (5xx) | **SLO reference:** These thresholds are derived from the service SLOs — p99 < [Z ms], error rate < [0.1]%, availability [99.9]%. --- ## 3. Traffic Model **Baseline traffic (current production):** - Average RPS: [N] req/sec - Peak RPS (observed): [N] req/sec - Request distribution by endpoint: - [Endpoint 1]: [X]% of traffic - [Endpoint 2]: [Y]% of traffic - [Endpoint 3]: [Z]% of traffic **Simulated user behaviour:** - Think time between requests: [X–Y] seconds (randomised) - Session duration: [N] minutes average - Authenticated vs anonymous ratio: [X]%/[Y]% - Geographic distribution: [Region 1 X]%, [Region 2 Y]% --- ## 4. Test Scenarios ### Scenario 1: Baseline (Steady-State) **Purpose:** Confirm the service performs acceptably under normal production load. **Duration:** 10 minutes **Load profile:** Ramp to [N] RPS over 2 minutes, hold for 8 minutes. **Concurrency:** [N] virtual users **Pass criteria:** All thresholds in the Baseline column of the targets table above. --- ### Scenario 2: Stress Test **Purpose:** Find the breaking point — how much load can the service handle before SLOs are breached? **Duration:** 20–30 minutes **Load profile:** Ramp from [N] RPS (baseline) to [N × 5] RPS in 5-minute steps. Hold each step for 5 minutes. Stop at first SLO breach. **Concurrency:** Scales with RPS target **What to record:** - RPS at which p99 latency first exceeds SLO - RPS at which error rate first exceeds SLO - Whether the service recovers when load drops back to baseline --- ### Scenario 3: Spike Test **Purpose:** Simulate a sudden traffic surge (flash sale, viral event, bot attack). **Duration:** 15 minutes **Load profile:** Hold at [N] RPS (baseline) for 3 minutes, spike to [N × 10] RPS instantly, hold for 5 minutes, drop back to baseline for 7 minutes. **What to record:** - Latency during spike and recovery - Whether the service sheds load gracefully (rate limiting, queue depth) - Time to recover to baseline latency after spike ends --- ### Scenario 4: Soak / Endurance Test **Purpose:** Detect memory leaks, connection pool exhaustion, and slow degradation over time. **Duration:** 4–8 hours (run overnight) **Load profile:** Steady [N × 1.5] RPS (50% above baseline) for entire duration. **What to watch:** - Memory usage trend over time (should not grow unboundedly) - Error rate trend (should be flat, not creeping up) - GC pause frequency (JVM/Go services) - Database connection pool utilisation - p99 latency trend (should not creep up over hours) --- ## 5. Test Environment Requirements ### Infrastructure | Component | Requirement | Notes | |---|---|---| | Service under test | Isolated from production | [N] replicas, matching prod resource limits | | Database | Separate instance with production-scale data | Seed script in section 7 | | Cache (Redis/Memcached) | Empty at test start | Ensures cold-start conditions are tested | | Load generator | Separate from service under test | [N] vCPUs, [N] GB RAM minimum | | Network | Low-latency path to service | Do not run generator on same host | ### Data Seeding Before every test run, ensure the environment has: ```bash # Seed test users (needed for authenticated endpoint tests) [seed command or script path — e.g. python scripts/seed_load_test_users.py --count 10000] # Seed test data for read endpoints [seed command — e.g. ./scripts/seed_pro
Conduct a structured ethical review of an AI or ML feature, model, or product. Use when preparing to deploy an AI system, assessing algorithmic risk, auditing a model for bias, or producing a responsible AI impact assessment. Produces a structured ethics review covering fairness, transparency, privacy, safety, accountability, and societal impact with a risk tier score, pre-deployment checklist, and prioritised mitigations.
Structure AI and ML product decisions with the rigour of any product decision. Use when building AI-powered features, evaluating LLM integrations, designing AI products, or assessing AI readiness. Produces a complete AI product canvas covering problem definition, model approach, data requirements, evaluation framework, UX design, responsible AI checklist, and launch monitoring plan.
Transform feature briefs into structured design briefs that give designers the context they need before opening Figma. Use when asked to write a design brief, create a design handoff, brief a designer on a new feature, or translate a PRD into design requirements. Produces a brief with user goal, emotional context, success criteria, constraints, edge cases, and out-of-scope boundaries.
Design statistically rigorous A/B tests and interpret experiment results. Use when asked to design an experiment, run an A/B test, calculate sample size, interpret test results, or assess whether an experiment was successful. Produces a complete experiment design with hypothesis, sample size, run time, success criteria, and risk flags — or a results interpretation with ship/iterate/kill recommendation.
Synthesises user signals from multiple research sources into a unified, weighted insight brief. Use when you have data from interviews, support tickets, NPS verbatims, app reviews, or sales calls and need to reconcile contradictions, surface the underlying need behind requests, or answer 'what are users really telling us'. Produces ranked insights with confidence ratings, source weighting rationale, divergent signal analysis by user segment, and a research gap identification section.
Structure a product data analysis, metric deep-dive, funnel analysis, or cohort study. Use when asked to analyse product metrics, investigate a drop in conversion, explain a data change to stakeholders, or find the root cause of a metric movement. Produces a structured analysis with question, root cause, confidence level, and recommended action.
Interpret product metrics against goals and surface actionable signals. Use when asked to analyse product health, review key metrics, investigate a performance issue, produce a health report, or assess product-market fit signals. Produces a structured health report with RAG status, trend analysis, root cause hypotheses, and prioritised actions.
Structure a retention analysis, churn investigation, or engagement deep-dive for any product team. Use when asked to analyse user retention, investigate churn, measure DAU/MAU, or build a retention improvement plan. Produces a retention snapshot with root cause hypotheses, aha-moment correlation, and prioritised interventions.