octoperf-bench-reports
The octoperf-bench-reports skill teaches how to read OctoPerf benchmark report widgets by mapping each of 20+ widget types to the correct tool for extracting values, explaining semantic gotchas like how 304 cache hits distort throughput or why Hits and Hits CONTAINER differ, and clarifying trend report architecture where a reference run's selectors dynamically pull matching runs. Use this when interpreting a specific bench report widget's data or understanding why a metric behaves unexpectedly.
git clone --depth 1 https://github.com/OctoPerf/octoperf-claude-plugins /tmp/octoperf-bench-reports && cp -r /tmp/octoperf-bench-reports/plugins/octoperf/skills/octoperf-bench-reports ~/.claude/skills/octoperf-bench-reportsSKILL.md
# OctoPerf — Reading bench reports
A `BenchReport` is a polymorphic document. Its `items` array carries
20+ widget types (charts, tables, top-N, insights, …), each backed by
its own `get_report_*_values` tool. This skill maps every widget you
can encounter to the right tool, calls out the **semantic gotchas**
that have repeatedly tripped LLMs, and explains the trend-report
architecture.
## The BenchReport shape — one quick anchor
```
BenchReport {
id, projectId, name, benchResultIds, // — the runs the report aggregates
configs: [ApdexReportConfig | TrendReportConfig | ...], // global settings
items: [polymorphic BenchReportItem...] // — what's visible on the page
}
```
- A **regular report** has 1 entry in `benchResultIds` (the run it
was generated for) and items that pull values from that run.
- A **trend report** has 1 entry too (the *reference* anchor) and a
`TrendReportConfig` in `configs` whose `selectors` are
re-evaluated **dynamically at read time** to pull in other matching
runs. See [Trend reports](#trend-reports) below.
To read any widget, always start with:
```
mcp__octoperf__get_bench_report(reportId)
```
then dispatch on each `items[i]["@type"]` per the table below.
## Widget → tool mapping
For every widget type that's reachable from MCP:
| `@type` | Tool | Returns |
|--------------------------------------|-------------------------------------------------------|----------------------------------------------------------------------------|
| `SummaryReportItem` | `get_report_summary_values` | `List<Double>` aligned with `item.metrics[i].id` |
| `BarChartReportItem` | `get_report_summary_values` *(same shape as Summary)* | `List<Double>` aligned with `item.metrics[i].id` |
| `StatisticTableReportItem` | `get_report_table_values` | `List<TableEntry>` (`actionId` → `values`) |
| `StatisticTreeReportItem` | `get_report_tree_values` | `List<TreeEntry>` (`virtualUserId` + `actionId` → `values`) — per-VU split |
| `TopReportItem` | `get_report_top_values` | `TopResult` (top-N actionIds + per-action curve) |
| `PieChartReportItem` | `get_report_pie_values` | `List<Map<String, Long>>` (one map per benchResult, label → count) |
| `LineChartReportItem` | `get_report_line_chart_values` | `List<List<GraphPoint>>` (one series per metric, `(x=epoch-ms, y)`) |
| `PercentilesChartReportItem` | `get_report_line_chart_values` | Same shape — percentile curve |
| `StackedChartReportItem` | `get_report_stacked_chart_values` | `List<MapGraphPoint>` (`x` + per-series map) |
| `AreaRangeChartReportItem` | `get_report_area_range_values` | `AreaRangeResult` (`curve` vs `reference`, `rmse`) |
| `InsightsReportItem` | `get_report_insights` | `Set<Insight>` (severity + value + drill-in widget) |
| `ErrorsReportItem` | `get_report_errors` | `List<BenchError>` (per-sample failures) |
| `ThresholdAlarmReportItem` | `get_report_threshold_alarms` | `List<ThresholdAlarm>` (per-breach) |
| `TextualMonitorReportItem` | `get_report_textual_monitors` | `List<TextualCounterValue>` (string-valued monitor samples) |
| `LoadGeneratorsChartReportItem` | `list_bench_load_generators` | `List<BenchLoadGenerator>` — chart is derived from this |
| `LoadGeneratorsTreeReportItem` | `list_bench_load_generators` | Same source as the chart — tree is just a different rendering |
| `TextReportItem` | *(no tool — descriptive markdown)* | n/a — `item.description` carries the markdown |
| `SynopsisReportItem` | *(no tool — scenario metadata)* | n/a — render the synopsis section in the UI for the user |
| `TrendConfigReportItem` | *(no tool — read `configs`)* | n/a — the selectors live in the report's `TrendReportConfig` |
| `MonitorsTableReportItem` | **❌ no MCP tool** | UI only — list of monitor connections with threshold-alarm counts |
Two follow-up tools to keep in mind:
- After `get_report_errors`, drill into a specific failed sample with `fetch_bench_error_http(benchResultId, actionId, timestamp)` — returns the full request + response of that one breach.
- For non-text bench-result artefacts (Playwright `trace.zip`, screenshots, HAR), `download_bench_result_file(benchResultId, filename)` returns a presigned GET URL (single-use, ~5 min) — fetch the bytes directly with your code interpreter. `read_bench_result_file_lines` only handles text.
## Semantic gotchas
A field-collected list of values that *look* like one thing but mean
another. Each cost an LLM debug cycle in the past — surface them to
the user when reading the data:
### `Hits` vs `Hits (CONTAINER)`
- `Hits` (and its rates `Hits/s`, `Hits successful total`) count **HTTP samplers only**.
- `Hits (CONTAINER)` countsUse whenever an OctoPerf operation runs asynchronously and the LLM has to wait for it to settle — `validate_virtual_user`, `run_scenario`, `export_bench_report_pdf`, the async correlation tasks behind `apply_correlations_to_virtual_user`, or any tool that returns a `taskId` / `benchResultId` instead of the final result. Defines the cadence, the terminal conditions, and the anti-patterns so the LLM does not tight-loop the MCP server or sleep blindly for the full expected duration.
Use when an OctoPerf Virtual User imported from a HAR/Postman/JMX recording fails its validation run because dynamic values (session tokens, CSRF, signed URLs, anti-forgery inputs, auth challenges) captured at recording time are stale on replay. Triggers on requests for "auto-correlation", "correlate the VU", "fix replay errors", "401/403 on replay after import", "tokens don't match", "signature mismatch in load test". Walks the LLM through framework preset selection, async polling, and regex-rule fallback. Requires the OctoPerf MCP server to be connected.
Use when the user asks to "export the report as PDF", "print the bench report", "get a PDF of report X", "share a PDF with stakeholders", or any variation that calls for a static artefact of an OctoPerf benchReport. Walks the LLM through the three-step async chain (submit print task → poll → download presigned URL). Requires the OctoPerf MCP server to be connected.
Use when the user wants to run a real-browser probe alongside a JMeter HTTP load test to capture user-perceived metrics (page load time, render time, JS execution, Core Web Vitals) while JMeter generates the bulk HTTP load. Triggers on "real browser monitoring during load test", "EUM probe", "playwright probe", "synthetic monitor during bench", "convert my JMeter VU to Playwright", "RealBrowser user", "TruClient equivalent", "hybrid load test (HTTP + browser)". Walks the LLM through JMeter→Playwright VU conversion (direct translation or codegen capture) and hybrid scenario composition (N×JMeter for load + 1×Playwright probe for UX measurement). Requires the OctoPerf MCP server.
Use when an OctoPerf load-test scenario has completed (or is running) and the user wants to understand why it failed, underperformed, or behaved unexpectedly. Triggers on "the load test failed", "why are response times so high", "high error rate in the scenario", "diagnose this bench", "the run looks bad". Walks the LLM through reading global metrics, narrowing scope, comparing against validation, and surfacing the right next step (re-validate, tune scenario, fix infra). Requires the OctoPerf MCP server and a `benchResultId` to investigate.
Use when scheduling an OctoPerf scenario to run at a specific time (one-shot) or on a recurring cadence (cron), or when listing / pausing / resuming / deleting an existing schedule. Triggers on "schedule the scenario for tomorrow morning", "run this every weekday at 8am", "every night at midnight", "pause the cron job", "delete the schedule", "show scheduled jobs". Covers the unusual cron format (Unix 5-field UTC, NOT Quartz), the timezone conversion gymnastics, the pre-flight rule (a misconfigured scenario will fire failing runs forever until disabled), and the full job lifecycle. Requires the OctoPerf MCP server.
Use when an OctoPerf Virtual User validation run has produced many failing actions and the user needs to diagnose them efficiently without reading every single failure serially. Triggers on "the validation is red", "lots of errors after import", "VU validation failed, what's wrong", "triage these failures", "why is my virtual user failing". Groups failures by category, drills into one representative per group, and proposes the matching MCP-tool fix. Requires the OctoPerf MCP server.