flow-next-pilot
The flow-next:pilot skill executes a single tick of an autonomous build-loop conductor, advancing exactly one ready spec by one pipeline stage (plan, plan-review, work, or make-pr) and terminating with a PILOT_VERDICT line for external drivers to interpret. Use it when you need mechanical, non-interactive spec progression within a session, delegating repeated invocation to host loop primitives like /loop or /goal, and avoiding nesting with external Ralph harness automation.
git clone --depth 1 https://github.com/gmickel/flow-next /tmp/flow-next-pilot && cp -r /tmp/flow-next-pilot/plugins/flow-next/skills/flow-next-pilot ~/.claude/skills/flow-next-pilotSKILL.md
# /flow-next:pilot — single-tick autonomous build-loop conductor
A tick is one invocation of `/flow-next:pilot`: select one ready spec, classify its current stage, dispatch exactly one existing stage skill, verify state advanced, and end with one terminal `PILOT_VERDICT` line. It is intentionally not a runner; `/loop` in Claude Code or `/goal` in Claude Code / Codex owns repeated invocation.
Pilot and Ralph are alternative autonomous drivers. Ralph is an external shell loop with receipt plumbing; pilot is an in-session conductor for host loop primitives. Never nest them, and never reuse Ralph harness state inside pilot.
Human judgment lives before pilot: the spec content, `depends_on_epics`, and the fn-58 `ready` gate are the consent boundary. Pilot executes the mechanical pipeline one stage at a time, with ambiguity reported as `NEEDS_HUMAN`.
## Preamble
**CRITICAL: flowctl is BUNDLED — NOT installed globally.** `which flowctl` will fail (expected). Define once; subsequent blocks (here and in `workflow.md`) use `$FLOWCTL`. Subagents that run in fresh context fall back to the repo-local copy:
```bash
FLOWCTL="${DROID_PLUGIN_ROOT:-${CLAUDE_PLUGIN_ROOT}}/scripts/flowctl"
[ -x "$FLOWCTL" ] || FLOWCTL=".flow/bin/flowctl"
```
## Pre-check: Local setup version
Non-blocking, same pattern as `/flow-next:plan` — one-line nag when the local setup lags the plugin:
```bash
SETUP_VER=$(jq -r '.setup_version // empty' .flow/meta.json 2>/dev/null)
PLUGIN_JSON="${DROID_PLUGIN_ROOT:-${CLAUDE_PLUGIN_ROOT}}/.claude-plugin/plugin.json"
PLUGIN_VER=$(jq -r '.version' "$PLUGIN_JSON" 2>/dev/null || echo "unknown")
if [[ -n "$SETUP_VER" && "$PLUGIN_VER" != "unknown" && "$SETUP_VER" != "$PLUGIN_VER" ]]; then
echo "Plugin updated to v${PLUGIN_VER}. Run /flow-next:setup to refresh local scripts (current: v${SETUP_VER})." >&2
fi
```
Continue regardless (never blocks; silent when setup was never run or versions match).
## Hard guards (before anything else)
Run these guards before selection, ledger writes, branch changes, or skill dispatch.
```bash
if [[ -n "${FLOW_RALPH:-}" || -n "${REVIEW_RECEIPT_PATH:-}" ]]; then
echo "Ralph and pilot are alternative drivers — never nest them" >&2
echo 'PILOT_VERDICT=NEEDS_HUMAN spec=- stage=- reason="nested under Ralph harness (FLOW_RALPH/REVIEW_RECEIPT_PATH set) — refuse to run"'
exit 1
fi
if git status --porcelain | grep -v '^.. \.flow/' >/dev/null; then
echo 'PILOT_VERDICT=NEEDS_HUMAN spec=- stage=- reason="dirty working tree at tick start"'
exit 0
fi
```
Dirty tree means dirty outside `.flow/`; pilot leaves state untouched. No cleanup, no claim reset, no strike.
## Mode Detection
Parse `$ARGUMENTS` for the scope lock, dry-run switch, and passthroughs. Unknown flags warn to stderr and are ignored. Defaults are `research=grep`, `depth=short`, and `review` resolved later via `$FLOWCTL review-backend`.
The loop handles both `--flag=value` and space-separated `--flag value` forms directly via a `PREV` token holder. It deliberately avoids bash positional parameters (`shift`-based parsing) — the host's argument interpolation rewrites positional tokens inside skill code blocks, which corrupts a `case`-on-positionals parse (observed live in the 1.13.0 dogfood).
```bash
RAW_ARGS="$ARGUMENTS"
PILOT_SPEC=""
PILOT_DRY_RUN=0
PILOT_REVIEW=""
PILOT_RESEARCH="grep"
PILOT_DEPTH="short"
PREV=""
for ARG in $RAW_ARGS; do
case "$PREV" in
--spec) PILOT_SPEC="$ARG"; PREV=""; continue ;;
--review) PILOT_REVIEW="$ARG"; PREV=""; continue ;;
--research) PILOT_RESEARCH="$ARG"; PREV=""; continue ;;
--depth) PILOT_DEPTH="$ARG"; PREV=""; continue ;;
esac
case "$ARG" in
--spec|--review|--research|--depth) PREV="$ARG" ;;
--spec=*) PILOT_SPEC="${ARG#--spec=}" ;;
--dry-run) PILOT_DRY_RUN=1 ;;
--review=*) PILOT_REVIEW="${ARG#--review=}" ;;
--research=*) PILOT_RESEARCH="${ARG#--research=}" ;;
--depth=*) PILOT_DEPTH="${ARG#--depth=}" ;;
-*) echo "Unknown flag: $ARG (ignored by /flow-next:pilot)" >&2 ;;
*) echo "Unknown argument: $ARG (ignored by /flow-next:pilot)" >&2 ;;
esac
done
[[ -n "$PREV" ]] && echo "Flag $PREV given without a value (ignored by /flow-next:pilot)" >&2
export PILOT_SPEC PILOT_DRY_RUN PILOT_REVIEW PILOT_RESEARCH PILOT_DEPTH
```
No branch flag exists in v1. Branch resolution is pilot-owned from the selected spec's `branch_name`.
## The verdict contract (read this before the workflow)
The `/goal` validator is transcript-blind: it reads conversation output only and never runs tools. Every tick therefore echoes its verification evidence into the output: flowctl status fields, task counts, task status transitions, and the gh-confirmed PR URL for make-pr.
Every tick ends with exactly one terminal line, the last line of the response, with nothing after it:
```text
PILOT_VERDICT=<ADVANCED|NO_WORK|BLOCKED|NEEDS_HUMAN> spec=<id> stage=<stage> reason="<one line>"
```
Use `spec=-` and `stage=-` when no spec was selected. Stage values are exactly `plan`, `plan-review`, `work`, `make-pr`, or `-`.
Driver condition examples:
```text
/goal keep running /flow-next:pilot until it prints PILOT_VERDICT=NO_WORK, or stop after 20 turns
/goal keep running /flow-next:pilot --review=codex until PILOT_VERDICT=NO_WORK or PILOT_VERDICT=NEEDS_HUMAN
```
## Forbidden
- Asking the user anything in the tick path. Pilot is autonomous; ambiguity maps to `NEEDS_HUMAN`.
- Dispatching any skill outside the stage set `{plan, plan-review, work, make-pr}`. Capture, interview, QA, resolve-pr, merge, and release are never pilot stages.
- Re-implementing sub-skill logic. Pilot owns selection, dispatch, verification, verdicts, and the strikes ledger only.
- Touching gh anywhere except the all-done classification branch's PR probe and the make-pr verification probe.
- Printing anything after the `PILOT_VERDICT` line.
- Running under Ralph (`FLOW_RALPH` / `REVIEW_RECEIPT_PATH`).
## Workflow
Execute [workflow.md](workflow.Synthesize the current conversation context into a flow-next spec at `.flow/specs/<spec-id>.md` via `flowctl spec create + spec set-plan` — agent-native, source-tagged, with mandatory read-back before write. Triggers on /flow-next:capture, "capture spec", "lock down what we discussed", "make a spec from this conversation", "convert conversation to spec". Optional `mode:autofix` token runs without questions and requires `--yes` to commit. Optional `--rewrite <spec-id>` overwrites an existing spec; `--from-compacted-ok` overrides the compaction-detection refusal; `--override-strategy` proceeds despite a contradiction with an active STRATEGY.md track (and prompts to record the override as a decision).
Render a cognitive-aid PR body from flow-next state and open via gh. Triggers on /flow-next:make-pr with optional spec id and flags (--draft, --ready, --no-mermaid, --base <ref>, --memory, --dry-run). Auto-detects spec from current branch when no id given. NOT Ralph-blocked — autonomous loops can surface a draft PR for human review.
Audit `.flow/memory/` entries against the current codebase and decide Keep / Update / Consolidate / Replace / Delete per entry. Triggers on /flow-next:audit, "audit memory", "review memory", "refresh learnings", "sweep stale memory", "consolidate overlapping memory entries". Optional `mode:autofix` token in arguments runs without questions and marks ambiguous as stale. Optional scope hint after the mode token (concept, category, module, or path) narrows what gets audited.
Show spec dependency graph and execution order. Use when asking 'what's blocking what', 'execution order', 'dependency graph', 'what order should specs run', 'critical path', 'which specs can run in parallel'.
Drive any UI surface like a real user - a web app, a Chromium-backed desktop app (Electron / WebView2, reached over CDP), or a genuinely native app (macOS AppKit/SwiftUI, or a non-CDP webview) reached via Computer Use. Detects the surface, picks the best available driver, degrades gracefully. Use to navigate sites, verify deployed UI, test web or desktop apps, capture baseline screenshots, drive a sign-in flow, scrape data, fill forms, run an e2e check, or inspect current page state. Triggers on "check the page", "verify UI", "test the site", "test this app", "drive the app", "automate this desktop app", "read docs at", "look up API", "visit URL", "browse", "screenshot", "scrape", "e2e test", "login flow", "capture baseline", "see how it looks", "inspect current", "before redesign", "Electron app", "native app".
[deprecated alias] Renamed to flow-next-spec-completion-review in flow-next 1.0 — invoke the new skill. Removed in 2.0.
Export RepoPrompt context to a markdown file for review with an external LLM (ChatGPT, Claude web, etc.). Use when you want Carmack-level review but prefer an external model. Triggers on "export context", "export for external review", "export plan for ChatGPT", "export impl review context", "review with an external model", "export review context".