visual-qa
The visual-qa skill provides rigorous verification of web and terminal user interfaces by capturing objective reference evidence through diff scripts, then running two parallel oracle passes to assess design system compliance and functional integrity. Use it after building or modifying any UI to confirm visual correctness before marking work complete, particularly for responsive layouts, CJK text rendering, terminal alignment, and pixel-level fidelity issues.
git clone --depth 1 https://github.com/code-yeongyu/lazycodex /tmp/visual-qa && cp -r /tmp/visual-qa/plugins/omo/skills/visual-qa ~/.claude/skills/visual-qaSKILL.md
## Codex Harness Tool Compatibility
This skill may include examples copied from the OpenCode harness. In Codex, do not call OpenCode-only tools such as `call_omo_agent(...)`, `task(...)`, `background_output(...)`, or `team_*(...)` literally. Translate those examples to Codex native tools:
| OpenCode example | Codex tool to use |
| --- | --- |
| `call_omo_agent(subagent_type="explore", ...)` | `multi_agent_v1.spawn_agent({"message":"TASK: act as an explorer. ...","agent_type":"explorer","fork_context":false})` |
| `call_omo_agent(subagent_type="librarian", ...)` | `multi_agent_v1.spawn_agent({"message":"TASK: act as a librarian. ...","agent_type":"librarian","fork_context":false})` |
| `task(subagent_type="plan", ...)` | `multi_agent_v1.spawn_agent({"message":"TASK: act as a planning agent. ...","agent_type":"plan","fork_context":false})` |
| `task(subagent_type="oracle", ...)` for final verification | `multi_agent_v1.spawn_agent({"message":"TASK: act as a rigorous reviewer. ...","agent_type":"codex-ultrawork-reviewer","fork_context":false})` |
| `task(category="...", ...)` for implementation or QA | `multi_agent_v1.spawn_agent({"message":"TASK: act as an implementation or QA worker. ...","fork_context":false})` |
| `background_output(task_id="...")` | `multi_agent_v1.wait_agent(...)` for mailbox signals |
| `team_*(...)` | Use Codex native subagents via `multi_agent_v1.spawn_agent`, `multi_agent_v1.send_input`, `multi_agent_v1.wait_agent`, and `multi_agent_v1.close_agent` |
Role-specific behavior must be described in a self-contained `message`. Use `fork_context: false` to start the child with only the initial prompt (no parent history); use `fork_context: true` only when full parent history is truly required. Include any required conversation context, files, diffs, constraints, and requested skill names directly in the spawned agent's `message`. OMO installs these selectable agent roles into `~/.codex/agents/`: `explorer`, `librarian`, `plan`, `momus`, `metis`, and `codex-ultrawork-reviewer` — pass the matching name as `agent_type` so the child gets that role's model and instructions. On `multi_agent_v2` sessions the same `agent_type` applies (the OMO installer exposes it) with `fork_turns` instead of `fork_context`. If the spawn tool exposes no `agent_type` parameter, omit it and describe the role inside `message`. If a code block below conflicts with this section, this section wins.
For work likely to exceed one wait cycle, require the child to send `WORKING: <task> - <current phase>` before long passes and `BLOCKED: <reason>` only when progress stops. A `multi_agent_v1.wait_agent` timeout only means no new mailbox update arrived. Treat a running child as alive. Fallback only when the child is completed without the deliverable, ack-only after followup, explicitly `BLOCKED:`, or no longer running.
# Visual QA - Dual-Oracle Web and TUI Verification
Verify a rendered UI against intent using objective script evidence plus two parallel read-only oracle passes, then synthesize one good/bad verdict. The script numbers focus the reviewers. They are not the verdict.
## Purpose and when to use
- Use after you build or change any UI, before calling it done. Covers web/page UIs and TUI/terminal UIs.
- Use when output must match a mock, a baseline, or a stated design intent; when you suspect a regression; when CJK (Korean/Japanese/Chinese) text may clip, misalign, or wrap awkwardly; when a claimed design system might actually be a flat image; when a terminal layout may overflow or its borders may break.
- Skip when there is no rendered surface (pure backend or library logic with no visual or terminal output). For broad post-implementation review use review-work; this skill is the visual specialist.
In the commands below, `$SKILL_DIR` is this skill's own directory (the folder containing this SKILL.md). The bundled script lives at `scripts/cli.ts` inside it.
## Step 1 - Detect the surface
- Web/page UI: renders in a browser (HTML/CSS/JS, components, canvas, SVG). Evidence is screenshots.
- TUI/terminal UI: renders as text in a terminal (box-drawing, panes, status lines, REPL/TUI apps). Evidence is terminal captures.
If the change touches both, run both capture tracks and feed both into the passes.
## Step 2 - Capture objective reference evidence
### Web
1. Capture a REFERENCE image: the user's mock/target, or a known-good baseline. Save as PNG.
2. Capture the ACTUAL rendered screenshot at the same viewport size using the project's browser tooling (the playwright, agent-browser, or dev-browser skill). Save as PNG.
3. Run the diff and keep the JSON:
```
bun "$SKILL_DIR/scripts/cli.ts" image-diff <reference.png> <actual.png>
```
Key fields: `dimensionsMatch`, `diffRatio` (0..1), `similarityScore` (0..100), `alphaChannelIntact`, `hotspots[]` (grid regions ranked by `diffRatio`).
### TUI
1. Capture plain text and an ANSI-preserving copy:
```
tmux capture-pane -p > capture.txt
tmux capture-pane -e -p > capture-ansi.txt
```
2. Run the check with the REAL terminal width and keep the JSON:
```
bun "$SKILL_DIR/scripts/cli.ts" tui-check capture.txt --cols <N>
```
Key fields: `maxWidth`, `overflowLines[]`, `borderMisaligned`, `wideCharColumns[]`, `hasAnsi`.
This JSON (diff ratio, similarity score, hotspots or overflow lines, border alignment, wide-char columns, alpha) is REFERENCE evidence to aim the reviewers. It is not the verdict by itself.
## Step 3 - Dispatch two read-only QA subagents in parallel
Send BOTH task calls in a single message so they run concurrently. Each oracle is read-only: it reviews and reports, it cannot modify files. Each returns PASS, REVISE, or FAIL with concrete, located findings. Pass A proves the surface is a real design-system implementation, not a mock-only or faked-image substitute. Pass B directly opens screenshots and inspects source/content for visual and CJK defects.
Paste evidence directly into each prompt: source code, the plain-text TUI capturesUse when Codex needs to understand or respond to automatic comment-checker feedback emitted after an edit-like PostToolUse hook.
Use when Codex needs language-server diagnostics, definitions, references, symbols, or rename safety checks in the current workspace.
Use when the user asks about Codex Rules behavior, injected project rules, supported rule file locations, matching, or environment configuration.
Codex-native strategic planning consultant. Explores the codebase exhaustively, surfaces only the ambiguities exploration cannot resolve, asks the user, and waits for explicit approval before producing one decision-complete work plan. MUST USE when the work has 5+ steps, scope is ambiguous, multiple modules are involved, or the user asks for a plan. Triggers: ulw-plan, plan this, create a work plan, interview me, start planning, plan mode, break this down.
Goal-like loop that uses ultrawork mode to decompose work into systematic, evidence-bound steps.
MUST USE for any real runtime debugging across ANY language or binary — crashes, silent failures, wrong responses, stuck processes, memory leaks, async misbehavior, unexplained timing, reverse engineering. Runs a hypothesis-driven loop: form ≥3 hypotheses, investigate in parallel, after 2 failed rounds spawn Oracles from orthogonal angles, confirm root cause, lock with a failing test, fix minimally, QA by actually USING the system, scrub artifacts. The actual HOW lives in `references/` — READ THEM. Triggers: 'debug this', 'why is X not working', 'hanging', 'attach a debugger', 'reverse engineer', 'pwndbg', 'gdb', 'lldb', 'node inspect', 'tsx debug', 'pdb', 'dlv', 'delve', 'rust-gdb', 'set a breakpoint', 'context window exploded', 'why is the response empty', 'attach the debugger', 'debug it', 'why is this happening', 'trace this bug', 'reproduce and fix', 'silent failure', 'HTTP 200 but empty', 'why did it stop', 'inspect the binary', 'reverse engineering', 'playwright'.
Designer-turned-developer who crafts stunning UI/UX even without design mockups
MUST USE whenever a task needs a commit or git-history investigation. Covers atomic commits, staging, commit-message style, rebase, squash, fixup/autosquash, blame, bisect, reflog, git log -S/-G, and questions like who wrote this or when was this added. Do not use for ordinary code edits unless the user asks for git work.