octoperf-validation-triage
The octoperf-validation-triage skill diagnoses failures in OctoPerf Virtual User validation runs by grouping failures into categories based on root cause, then examining one representative failure per group and recommending the appropriate fix. Use this when a validation run completes with many failing actions and you need efficient diagnosis without reviewing each failure individually. Requires the OctoPerf MCP server.
git clone --depth 1 https://github.com/OctoPerf/octoperf-claude-plugins /tmp/octoperf-validation-triage && cp -r /tmp/octoperf-validation-triage/plugins/octoperf/skills/octoperf-validation-triage ~/.claude/skills/octoperf-validation-triageSKILL.md
# OctoPerf — Validation-failure triage
When a Virtual User validation finishes with many failing actions,
reading each failure detail one by one wastes context window and time.
This skill groups failures by **root cause**, drills into one
representative per group, and maps each group to the correct fix.
## When this applies
- A `validate_virtual_user` run finished with `finished=true` and at least one failing action.
- The user wants to know *what's wrong*, not just retry blindly.
## Steps
### 1. Get the failure index — no bodies yet
```
mcp__octoperf__get_virtual_user_validation_index(virtualUserId)
```
The index returns one entry per **validated** HTTP action — `actionId`,
`success`/`total` counts, `successTimestamps` and `failedTimestamps`. For
triage, focus on the entries whose `failedTimestamps` is non-empty (the
actions that failed at least one run); entries with only `successTimestamps`
passed every run. **Do not fetch failure details yet** — the index alone is
usually enough to classify.
(A `successTimestamps` entry is also a handle to read a *passing* run's
body: pass it to `fetch_validation_http_body` with
`kind=VALIDATION_RESPONSE` — e.g. to inspect what a Debug-style action
actually captured on a green run.)
OctoPerf's "KO" rule isn't just "non-2XX". A sample is KO when:
| Live response code | Recorded response code | Result |
|--------------------|------------------------|------------------------------------------------------------------|
| 2XX | none | ✅ OK |
| 2XX | 2XX | ✅ OK |
| 2XX | 3XX | ❌ KO (recording expected a redirect, got a body — usually wrong) |
| 3XX | 3XX | ✅ OK |
| 3XX | 2XX | ❌ KO (recording expected a body, got a redirect) |
| Any 4XX / 5XX | anything | ❌ KO |
| Unknown code | anything | ❌ KO |
| Any code | 4XX / 5XX / unknown | ❌ KO (recording was already broken; re-record) |
This matters when classifying: a "successful" 200 against a recorded 302 is a
real bug, not a false positive — the VU is hitting a different code path than
when it was recorded.
**Caveat — ResponseAssertion overrides the matrix.** A `ResponseAssertion`
attached to the action can mark a sample KO even when the matrix above
says OK (e.g. status 200 against recorded 200, but the body contains
the assertion's pattern — typical for forms that re-render with an
error message on a 200). Check for assertions on the failing action
before assuming an HTTP-level mismatch. The assertion's `type` field
also controls scope:
- `REQUEST_ONLY` — only the parent sample is checked.
- `REQUEST_AND_SUBREQUESTS` — parent **and** embedded resources. Matters when `downloadResources=true`: a 404 on an embedded CSS can fail the parent.
- `SUBREQUESTS_ONLY` — only embedded resources.
### 1b. Partial failures (`success < total`)
When the index entry shows `success < total` (e.g.
`{"success": 2, "total": 3, "failedTimestamps": [...]}`), the action
failed on **some iterations** but passed on others. Common patterns:
- **Bad row in a CSVVariable** — one iteration picks a row that
doesn't pass server-side validation (e.g. a wrong-password row,
a malformed account id, a stale token). Cheapest possible fix:
edit the CSV. Investigate this **before** correlation when the VU
uses a `CSVVariable`.
- **Race condition** — uncommon in 1-user validation but possible if
the VU has internal `LoopContainerAction` or shared variables.
- **CSV exhausted with EOF = StopVU** — iterations past the CSV length
never run; not strictly a failure but `success < total` reflects it.
`failedTimestamps` lists every failed iteration's epoch-ms. Pass the
**exact** timestamp to `get_validation_failure_detail` to pin the
failing iteration instead of getting a random one — critical when
debugging CSV-driven flakiness so you see the bad row, not a good one.
### 2. Group by category
Bucket the failing actions into a small number of categories. The
common ones:
| Category | Index signal | Likely fix |
|---------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| **Auth / state** | 401, 403; "invalid token", "expired", "CSRF" | Auto-correlation (separate skill) |
| **Variable / data** | 400 with body validation errors; "field required", "invalid format" | Edit / create variables; check CSV upload |
| **HTTP server config**Use whenever an OctoPerf operation runs asynchronously and the LLM has to wait for it to settle — `validate_virtual_user`, `run_scenario`, `export_bench_report_pdf`, the async correlation tasks behind `apply_correlations_to_virtual_user`, or any tool that returns a `taskId` / `benchResultId` instead of the final result. Defines the cadence, the terminal conditions, and the anti-patterns so the LLM does not tight-loop the MCP server or sleep blindly for the full expected duration.
Use when an OctoPerf Virtual User imported from a HAR/Postman/JMX recording fails its validation run because dynamic values (session tokens, CSRF, signed URLs, anti-forgery inputs, auth challenges) captured at recording time are stale on replay. Triggers on requests for "auto-correlation", "correlate the VU", "fix replay errors", "401/403 on replay after import", "tokens don't match", "signature mismatch in load test". Walks the LLM through framework preset selection, async polling, and regex-rule fallback. Requires the OctoPerf MCP server to be connected.
Use when reading or interpreting an OctoPerf bench report — picking the right `get_report_*_values` tool for a given widget, understanding the difference between flat and trend reports, decoding semantic gotchas (Hits vs Hits CONTAINER, 304 cache hits skewing throughput, Playwright per-step row types, etc.). Triggers on "what's the right tool for this widget", "explain this metric", "how do I read this trend report", "what does parallelRunsSupported mean", "why is the Network row 24ms while page.goto is 364ms", "DELTA computeType". Complements `octoperf-scenario-diagnosis` — that skill walks the diagnosis workflow, this one is the widget-by-widget reading guide. Requires the OctoPerf MCP server.
Use when the user asks to "export the report as PDF", "print the bench report", "get a PDF of report X", "share a PDF with stakeholders", or any variation that calls for a static artefact of an OctoPerf benchReport. Walks the LLM through the three-step async chain (submit print task → poll → download presigned URL). Requires the OctoPerf MCP server to be connected.
Use when the user wants to run a real-browser probe alongside a JMeter HTTP load test to capture user-perceived metrics (page load time, render time, JS execution, Core Web Vitals) while JMeter generates the bulk HTTP load. Triggers on "real browser monitoring during load test", "EUM probe", "playwright probe", "synthetic monitor during bench", "convert my JMeter VU to Playwright", "RealBrowser user", "TruClient equivalent", "hybrid load test (HTTP + browser)". Walks the LLM through JMeter→Playwright VU conversion (direct translation or codegen capture) and hybrid scenario composition (N×JMeter for load + 1×Playwright probe for UX measurement). Requires the OctoPerf MCP server.
Use when an OctoPerf load-test scenario has completed (or is running) and the user wants to understand why it failed, underperformed, or behaved unexpectedly. Triggers on "the load test failed", "why are response times so high", "high error rate in the scenario", "diagnose this bench", "the run looks bad". Walks the LLM through reading global metrics, narrowing scope, comparing against validation, and surfacing the right next step (re-validate, tune scenario, fix infra). Requires the OctoPerf MCP server and a `benchResultId` to investigate.
Use when scheduling an OctoPerf scenario to run at a specific time (one-shot) or on a recurring cadence (cron), or when listing / pausing / resuming / deleting an existing schedule. Triggers on "schedule the scenario for tomorrow morning", "run this every weekday at 8am", "every night at midnight", "pause the cron job", "delete the schedule", "show scheduled jobs". Covers the unusual cron format (Unix 5-field UTC, NOT Quartz), the timezone conversion gymnastics, the pre-flight rule (a misconfigured scenario will fire failing runs forever until disabled), and the full job lifecycle. Requires the OctoPerf MCP server.