Skip to main content
ClaudeWave
Skill630 repo starsupdated today

flow-next-qa

flow-next-qa executes live-application QA testing by driving a deployed app through test scenarios derived directly from the specification's acceptance criteria, requirement IDs, and boundaries, then files structured P0/P1/P2 findings with captured evidence and delivers a YES/NO ship verdict. Use it to validate running behavior against intent rather than inspect source code, filling the gap between static reviews and real-user verification.

Install in Claude Code
Copy
git clone --depth 1 https://github.com/gmickel/flow-next /tmp/flow-next-qa && cp -r /tmp/flow-next-qa/plugins/flow-next/skills/flow-next-qa ~/.claude/skills/flow-next-qa
Then start a new Claude Code session; the skill loads automatically.

SKILL.md

# /flow-next:qa — live-app real-user QA pass

flow-next's review surface today is all static: `impl-review`, `spec-completion-review`, `quality-auditor`, `code-review`. Nothing drives the *running* app like an unforgiving real user. `/flow-next:qa` fills that gap — it drives the deployed app (via **fn-51 flow-next-drive**), files structured P0/P1/P2 findings with evidence, and ends with a YES/NO ship verdict emitted as a proof-of-work receipt.

The differentiator vs spec-less QA tools is **the spec is the source of intent**: flow-next derives test scenarios directly from the spec — acceptance criteria → scenarios, R-IDs → coverage, boundaries → what NOT to test, decision context → expected behavior. The host already encodes intent instead of reconstructing it. The QA discipline (P0/P1/P2 taxonomy, evidence rules, session hygiene) is a lean borrow from Ray Fernando's `running-bug-review-board` skill (Apache-2.0 — credited in CHANGELOG); flow-next stays lean (no 18-reference port, ≤500-line skill cap).

**Read [workflow.md](workflow.md) for the full phase-by-phase execution** (discover → derive → prepare → execute → file → verdict).

## The hard rule — PASS is forbidden from source inspection

**QA must NEVER mark PASS (SHIP) by reading source code.** A live-app QA pass is the gap that all other flow-next review already covers statically. The verdict rests on **captured evidence from the running app** — screenshots, console dumps, observed state — never on agent narration, never on "the code looks correct", never on inferring behavior from the diff. If no live app is reachable (no deploy or no driver), the outcome is **BLOCKED** (could not verify), not PASS. This rule is load-bearing — it is what makes the skill a real-user QA pass rather than a second static review.

## Preamble

**CRITICAL: flowctl is BUNDLED — NOT installed globally.** `which flowctl` will fail (expected). Define once; subsequent blocks (here and in `workflow.md`) use `$FLOWCTL`. Subagents that run in fresh context fall back to the repo-local copy:

```bash
FLOWCTL="${DROID_PLUGIN_ROOT:-${CLAUDE_PLUGIN_ROOT}}/scripts/flowctl"
[ -x "$FLOWCTL" ] || FLOWCTL=".flow/bin/flowctl"
```

## Pre-check: Local setup version

Non-blocking, same pattern as `/flow-next:plan` — one-line nag when the local setup lags the plugin:

```bash
SETUP_VER=$(jq -r '.setup_version // empty' .flow/meta.json 2>/dev/null)
PLUGIN_JSON="${DROID_PLUGIN_ROOT:-${CLAUDE_PLUGIN_ROOT}}/.claude-plugin/plugin.json"
PLUGIN_VER=$(jq -r '.version' "$PLUGIN_JSON" 2>/dev/null || echo "unknown")
if [[ -n "$SETUP_VER" && "$PLUGIN_VER" != "unknown" && "$SETUP_VER" != "$PLUGIN_VER" ]]; then
  echo "Plugin updated to v${PLUGIN_VER}. Run /flow-next:setup to refresh local scripts (current: v${SETUP_VER})." >&2
fi
```

Continue regardless (never blocks; silent when setup was never run or versions match).

**Inline skill (no `context: fork`)** — runs on the host agent, not a forked subagent, because the **prepare** phase must ask the user for undocumented facts (target URL / test account — info-only, never a confirm gate) and a forked subagent cannot ask the user back (Claude Code issues #12890, #34592). The host asks via `AskUserQuestion`. (sync-codex.sh rewrites any `AskUserQuestion` to a plain-text numbered prompt in the Codex mirror.)

## Mode Detection

Parse `$ARGUMENTS`. The first non-flag token is the spec id (required). The value-taking caller overrides the downstream phases honor — `--target <url>` (Phase 3.1), `--receipt <path>` (Phase 6.3), and `--base <ref>` (§1.2 base-branch override) — **must consume their operand here** (both `--flag value` and `--flag=value` forms, mirroring make-pr's `--base`), or the operand falls through to the `*)` arm and is mis-assigned as `SPEC_ID` (Phase 1 then rejects the URL/path as "Not a spec"). They populate `QA_TARGET_URL` / `QA_RECEIPT_OVERRIDE` / `QA_BASE_REF` — the exact variables Phases 3.1 / 6.3 / §1.2 read. Other flags (viewport, autonomy) are reserved for later tasks; the skeleton shifts them harmlessly.

```bash
RAW_ARGS="$ARGUMENTS"
SPEC_ID=""

# The loop handles both `--flag=value` and space-separated `--flag value`
# forms via a PREV token holder. No bash positional parameters here — the
# host's argument interpolation rewrites positional tokens inside skill code
# blocks (pilot dogfood finding, 1.13.0).
PREV=""
for ARG in $RAW_ARGS; do
  case "$PREV" in
    --target)  QA_TARGET_URL="$ARG"; PREV=""; continue ;;        # Phase 3.1 caller override
    --receipt) QA_RECEIPT_OVERRIDE="$ARG"; PREV=""; continue ;;  # Phase 6.3 receipt path
    --base)    QA_BASE_REF="$ARG"; PREV=""; continue ;;          # §1.2 base-branch override
  esac
  case "$ARG" in
    --target|--receipt|--base) PREV="$ARG" ;;
    --target=*)  QA_TARGET_URL="${ARG#--target=}" ;;        # Phase 3.1 caller override
    --receipt=*) QA_RECEIPT_OVERRIDE="${ARG#--receipt=}" ;; # Phase 6.3 receipt path
    --base=*)    QA_BASE_REF="${ARG#--base=}" ;;            # §1.2 base-branch override
    -*) echo "Unknown flag: $ARG (reserved for a later task)" >&2 ;;
    *)  [[ -z "$SPEC_ID" ]] && SPEC_ID="$ARG" ;;
  esac
done
[[ -n "$PREV" ]] && echo "Flag $PREV given without a value (ignored)" >&2
export QA_TARGET_URL QA_RECEIPT_OVERRIDE QA_BASE_REF   # carry the resolved overrides into workflow.md Phases 3.1 / 6.3 / §1.2
```

When `SPEC_ID` is empty, the **discover** phase resolves it (branch-match, or by asking the user via `AskUserQuestion` as an info prompt) — never silently default.

Ralph mode (`FLOW_RALPH=1` or `REVIEW_RECEIPT_PATH` set) is detected in workflow.md §AUTONOMY — the skill is **aware but not Ralph-blocked** (R11). The deep autonomy routing (autonomous when target URL + accounts are configured; receipt path resolution) is owned by a downstream task; the skeleton only lays the section anchor.

## fn-51 consumption — a read-and-drive contract, NOT a callable API

A skill is not a function. QA does **NOT** "call" flow-next-drive. The host agent *
specsSkill
flow-next-captureSkill

Synthesize the current conversation context into a flow-next spec at `.flow/specs/<spec-id>.md` via `flowctl spec create + spec set-plan` — agent-native, source-tagged, with mandatory read-back before write. Triggers on /flow-next:capture, "capture spec", "lock down what we discussed", "make a spec from this conversation", "convert conversation to spec". Optional `mode:autofix` token runs without questions and requires `--yes` to commit. Optional `--rewrite <spec-id>` overwrites an existing spec; `--from-compacted-ok` overrides the compaction-detection refusal; `--override-strategy` proceeds despite a contradiction with an active STRATEGY.md track (and prompts to record the override as a decision).

flow-next-make-prSkill

Render a cognitive-aid PR body from flow-next state and open via gh. Triggers on /flow-next:make-pr with optional spec id and flags (--draft, --ready, --no-mermaid, --base <ref>, --memory, --dry-run). Auto-detects spec from current branch when no id given. NOT Ralph-blocked — autonomous loops can surface a draft PR for human review.

flow-next-auditSkill

Audit `.flow/memory/` entries against the current codebase and decide Keep / Update / Consolidate / Replace / Delete per entry. Triggers on /flow-next:audit, "audit memory", "review memory", "refresh learnings", "sweep stale memory", "consolidate overlapping memory entries". Optional `mode:autofix` token in arguments runs without questions and marks ambiguous as stale. Optional scope hint after the mode token (concept, category, module, or path) narrows what gets audited.

flow-next-depsSkill

Show spec dependency graph and execution order. Use when asking 'what's blocking what', 'execution order', 'dependency graph', 'what order should specs run', 'critical path', 'which specs can run in parallel'.

flow-next-driveSkill

Drive any UI surface like a real user - a web app, a Chromium-backed desktop app (Electron / WebView2, reached over CDP), or a genuinely native app (macOS AppKit/SwiftUI, or a non-CDP webview) reached via Computer Use. Detects the surface, picks the best available driver, degrades gracefully. Use to navigate sites, verify deployed UI, test web or desktop apps, capture baseline screenshots, drive a sign-in flow, scrape data, fill forms, run an e2e check, or inspect current page state. Triggers on "check the page", "verify UI", "test the site", "test this app", "drive the app", "automate this desktop app", "read docs at", "look up API", "visit URL", "browse", "screenshot", "scrape", "e2e test", "login flow", "capture baseline", "see how it looks", "inspect current", "before redesign", "Electron app", "native app".

flow-next-epic-reviewSkill

[deprecated alias] Renamed to flow-next-spec-completion-review in flow-next 1.0 — invoke the new skill. Removed in 2.0.

flow-next-export-contextSkill

Export RepoPrompt context to a markdown file for review with an external LLM (ChatGPT, Claude web, etc.). Use when you want Carmack-level review but prefer an external model. Triggers on "export context", "export for external review", "export plan for ChatGPT", "export impl review context", "review with an external model", "export review context".