Skip to main content
ClaudeWave
Skill0 repo starsupdated yesterday

octoperf-scenario-diagnosis

This Claude Code skill diagnoses why an OctoPerf load-test scenario underperformed, failed, or behaved unexpectedly by reading global metrics, narrowing the root cause to one of four classes, and recommending the next fix step. Use it when a test completes or is running and the user reports high error rates, slow response times, low throughput, or unexpected behavior. The skill requires a `benchResultId` and the OctoPerf MCP server to investigate.

Install in Claude Code
Copy
git clone --depth 1 https://github.com/OctoPerf/octoperf-claude-plugins /tmp/octoperf-scenario-diagnosis && cp -r /tmp/octoperf-scenario-diagnosis/plugins/octoperf/skills/octoperf-scenario-diagnosis ~/.claude/skills/octoperf-scenario-diagnosis
Then start a new Claude Code session; the skill loads automatically.

SKILL.md

# OctoPerf — Scenario / bench-result diagnosis

A scenario run produced metrics that look bad — high error rate, high
response times, low throughput, premature stop. This skill walks the
diagnosis: read metrics → narrow down → match the symptom to one of
four root-cause classes → surface the right fix.

## Inputs

You need a `benchResultId` from one of:

- A user-supplied id (often from a Slack / email link they paste).
- The return of `mcp__octoperf__run_scenario(scenarioId)`.
- `mcp__octoperf__list_bench_reports_by_project(projectId)` filtered on `benchResultIds` for the UI deep-link.

If `mcp__octoperf__get_bench_result(benchResultId)` shows
`state ∉ {FINISHED, ABORTED, ERROR}` the test is still running.
Either wait and re-check, or surface what *has* been measured so far
with the caveat that it may change. When you do wait, follow
`octoperf-async-polling` — bounded `Bash sleep` between polls, cadence
sized to the scenario's expected duration. Use `get_bench_result.state`
as the terminal check; `get_bench_status` returns elapsed-% and is for
progress display only.

## Steps

### 0. Did the run even start?

Before reading metrics, confirm the run actually produced samples.
`run_scenario` can fail **before** any HTTP traffic is generated —
infrastructure error, no matching plan, deserialisation issue,
configuration rejected. A diagnosis built on metrics from a run that
never started will mislead the user.

```
mcp__octoperf__get_bench_result(benchResultId)
```

The exhaustive state machine is `CREATED → PENDING → SCALING →
PREPARING → INITIALIZING → (ERROR | RUNNING) → (FINISHED | ABORTED)`.
Any other label is a transport / UI artefact.

- `state = FINISHED` → proceed to step 1.
- `state = ABORTED` → either manual stop or stall-abort; jump to
  the [jmeter.log signature catalogue](#jmeterlog-signature-catalogue).
- `state = ERROR` → the run errored during provisioning or startup,
  no samples to read. Pull the orchestration logs:

  ```
  mcp__octoperf__list_bench_docker_logs(benchResultId)
  ```

  Common ERROR-state causes:

  - *No matching plan / capacity exhausted* → run pre-flight on the
    scenario to see why: `get_scenario_matching_plans(scenarioId)`
    (empty result) + `list_active_subscriptions()` (lists caps).
    The binding cap is usually `maxRealBrowserUsers=0` on basic
    plans rejecting a Playwright UserProfile, or
    `maxProfilesPerScenario` rejecting a multi-VU hybrid.
  - *Image pull / provider not available* → docker log surfaces it.
  - *Validation pre-flight failed* (some on-prem setups force a
    sanity check before run) → handle as a validation issue, hand
    off to `octoperf-validation-triage`.

### 1. Read global metrics first

```
mcp__octoperf__list_bench_reports_by_project(projectId)
# pick the report tied to your benchResultId, then
mcp__octoperf__get_bench_report(reportId)
# locate the SummaryReportItem in the returned items list, then
mcp__octoperf__get_report_summary_values(reportId, summaryItemId)
```

The default report's SummaryReportItem aggregates the test-wide values:
average response time, percentiles (p50/p90/p95/p99), hits per second,
total error rate, error count by type, total transactions, throughput.
**Don't dive into per-action data yet** — the global view tells you
which class of problem you're in.

**Trust caveat — load-generator overload.** Before reading any response
time, check whether the bench report has a `MonitoringAlarmsReportItem`
firing on the load generators (CPU / memory / load average). If it
fires, the response times in the report are **underestimated**: the
load generator itself was the bottleneck, and JMeter's internal timing
becomes unreliable. Surface this as a confidence caveat ("response
times are suspect — LG was overloaded") before drawing conclusions,
and suggest re-running on the cloud or on a larger LG.

**Trust caveat — cache hits skew the global numbers.** JMeter's
`CacheManager` is enabled by default. When a recorded VU hits the
same URL repeatedly (typical on a session that revisits pages), the
server returns HTTP 304 Not Modified and JMeter records the sample —
but the response time / throughput then reflect a *cache check*, not
real load on the SUT. If `get_report_pie_values` on the response-codes
widget shows more than ~40% 304s, flag it: the visible numbers are an
optimistic floor, the real SUT cost lives in the 200 samples. To
diagnose the SUT, filter to status=200 when drilling into per-action
metrics.

**Trust caveat — fail-fast peaks.** When the response-codes pie shows
a peak of errors **correlated with the hit-rate peak**, the server is
failing fast (errors return short, cheap responses). The apparent
throughput spike is illusory — read the error rate **before** the hit
rate when a chart shows a sudden bump.

**LG monitoring caveats.**

- Recommended ceiling per LG: ~1000 hits/sec on a 4-8 CPU LG.
  Persistent CPU alerts above that volume usually mean the test
  exceeds a single LG's headroom — add LGs, don't blame the SUT.
- High CPU **after G1 Old collections start** is heap pressure, not
  CPU starvation. Check `G1 Old / collectionCount` on the LG-JVMs
  widget before recommending more LGs.
- On cloud LGs, `%UsedMemory` alerts essentially never fire (OctoPerf
  pre-provisions). When they fire on an on-prem agent, **another
  process on the host** is the cause — the JVM alone won't trigger it.
- An empty `LoadGeneratorsChartReportItem` (hosts) on an on-prem run
  usually means **IP Spoofing is enabled** on that LG, which disables
  agent monitoring entirely — not "no data".

### 1b. Run the insights heuristics

OctoPerf ships a `InsightsReportItem` in the default report — call
`get_report_insights` and let the platform classify the run for you.
One call returns up to ~15 insights tagged by severity (`ERROR` /
`WARN` / `INFO` / `PASSED`) with the heuristic's numeric value. This
is the fastest path to a classification — often skips the manual
table lookup in step 2.

```
mcp__octoperf__get_rep
octoperf-async-pollingSkill

Use whenever an OctoPerf operation runs asynchronously and the LLM has to wait for it to settle — `validate_virtual_user`, `run_scenario`, `export_bench_report_pdf`, the async correlation tasks behind `apply_correlations_to_virtual_user`, or any tool that returns a `taskId` / `benchResultId` instead of the final result. Defines the cadence, the terminal conditions, and the anti-patterns so the LLM does not tight-loop the MCP server or sleep blindly for the full expected duration.

octoperf-auto-correlationSkill

Use when an OctoPerf Virtual User imported from a HAR/Postman/JMX recording fails its validation run because dynamic values (session tokens, CSRF, signed URLs, anti-forgery inputs, auth challenges) captured at recording time are stale on replay. Triggers on requests for "auto-correlation", "correlate the VU", "fix replay errors", "401/403 on replay after import", "tokens don't match", "signature mismatch in load test". Walks the LLM through framework preset selection, async polling, and regex-rule fallback. Requires the OctoPerf MCP server to be connected.

octoperf-bench-reportsSkill

Use when reading or interpreting an OctoPerf bench report — picking the right `get_report_*_values` tool for a given widget, understanding the difference between flat and trend reports, decoding semantic gotchas (Hits vs Hits CONTAINER, 304 cache hits skewing throughput, Playwright per-step row types, etc.). Triggers on "what's the right tool for this widget", "explain this metric", "how do I read this trend report", "what does parallelRunsSupported mean", "why is the Network row 24ms while page.goto is 364ms", "DELTA computeType". Complements `octoperf-scenario-diagnosis` — that skill walks the diagnosis workflow, this one is the widget-by-widget reading guide. Requires the OctoPerf MCP server.

octoperf-export-bench-report-pdfSkill

Use when the user asks to "export the report as PDF", "print the bench report", "get a PDF of report X", "share a PDF with stakeholders", or any variation that calls for a static artefact of an OctoPerf benchReport. Walks the LLM through the three-step async chain (submit print task → poll → download presigned URL). Requires the OctoPerf MCP server to be connected.

octoperf-real-browser-probeSkill

Use when the user wants to run a real-browser probe alongside a JMeter HTTP load test to capture user-perceived metrics (page load time, render time, JS execution, Core Web Vitals) while JMeter generates the bulk HTTP load. Triggers on "real browser monitoring during load test", "EUM probe", "playwright probe", "synthetic monitor during bench", "convert my JMeter VU to Playwright", "RealBrowser user", "TruClient equivalent", "hybrid load test (HTTP + browser)". Walks the LLM through JMeter→Playwright VU conversion (direct translation or codegen capture) and hybrid scenario composition (N×JMeter for load + 1×Playwright probe for UX measurement). Requires the OctoPerf MCP server.

octoperf-schedulingSkill

Use when scheduling an OctoPerf scenario to run at a specific time (one-shot) or on a recurring cadence (cron), or when listing / pausing / resuming / deleting an existing schedule. Triggers on "schedule the scenario for tomorrow morning", "run this every weekday at 8am", "every night at midnight", "pause the cron job", "delete the schedule", "show scheduled jobs". Covers the unusual cron format (Unix 5-field UTC, NOT Quartz), the timezone conversion gymnastics, the pre-flight rule (a misconfigured scenario will fire failing runs forever until disabled), and the full job lifecycle. Requires the OctoPerf MCP server.

octoperf-validation-triageSkill

Use when an OctoPerf Virtual User validation run has produced many failing actions and the user needs to diagnose them efficiently without reading every single failure serially. Triggers on "the validation is red", "lots of errors after import", "VU validation failed, what's wrong", "triage these failures", "why is my virtual user failing". Groups failures by category, drills into one representative per group, and proposes the matching MCP-tool fix. Requires the OctoPerf MCP server.