Skill3.1k repo starsupdated 3d ago

tui-validate

tui-validate captures and validates Terminal User Interface output using freeze for screenshot generation and LLM-as-judge for semantic analysis. Use it for visual regression testing of TUI applications, verifying component rendering after code changes, or validating UI state matches expected layout and content without relying on brittle string matching.

View source Repository: ralph-orchestrator

Install in Claude Code

Copy

git clone --depth 1 https://github.com/mikeyobrien/ralph-orchestrator /tmp/tui-validate && cp -r /tmp/tui-validate/.claude/skills/tui-validate ~/.claude/skills/tui-validate

Then start a new Claude Code session; the skill loads automatically.

Definition

SKILL.md

# TUI Validate

## Overview

This skill validates Terminal User Interface (TUI) applications by capturing their output and using LLM-as-judge for semantic validation. It leverages [freeze](https://github.com/charmbracelet/freeze) from Charmbracelet for high-fidelity terminal screenshots and provides structured validation criteria.

**Philosophy**: Rather than brittle string matching, this skill uses semantic understanding to validate that TUI output "looks right" - checking layout, content presence, and visual hierarchy without breaking on minor formatting changes.

## When to Use

- Validating TUI rendering after changes
- Checking that UI components display correctly
- Visual regression testing for terminal applications
- Verifying TUI state after specific interactions
- Creating documentation screenshots with validation

## Prerequisites

**Required:**
- `freeze` CLI tool installed (`brew install charmbracelet/tap/freeze`)
- `tmux` for interactive TUI capture (optional, for live applications)

**Verification:**
```bash
# Check freeze is installed
freeze --version

# Check tmux is installed (for interactive capture)
tmux -V
```

## Parameters

- **target** (required): What to validate. One of:
  - `file:<path>` - ANSI output file to validate
  - `command:<cmd>` - Command to execute and capture
  - `tmux:<session>` - Live tmux session to capture
  - `buffer:<text>` - Raw text/ANSI to validate

- **criteria** (required): Validation criteria. Can be:
  - A predefined criteria name (see Built-in Criteria)
  - A custom criteria string describing what to check

- **output_format** (optional, default: "svg"): Screenshot format
  - `svg` - Vector format, best for documentation
  - `png` - Raster format, best for visual diff
  - `text` - Text-only extraction, fastest

- **save_screenshot** (optional, default: false): Whether to save the screenshot
  - If true, saves to `{target_name}.{format}` in current directory

- **judge_mode** (optional, default: "semantic"): Validation approach
  - `semantic` - LLM judges based on meaning and layout
  - `strict` - Also checks exact content presence
  - `visual` - Requires PNG, checks visual appearance

## Built-in Criteria

### `ralph-header`
Validates Ralph TUI header component:
- Iteration counter in `[iter N]` or `[iter N/M]` format
- Elapsed time in `MM:SS` format
- Hat indicator with emoji and name
- Mode indicator (`▶ auto` or `⏸ paused`)
- Optional scroll mode indicator `[SCROLL]`
- Optional idle countdown `idle: Ns`

### `ralph-footer`
Validates Ralph TUI footer component:
- Activity indicator (`◉ active`, `◯ idle`, or `■ done`)
- Last event topic display
- Search mode display when active

### `ralph-full`
Validates complete Ralph TUI layout:
- Header section at top (3 lines)
- Terminal content area (variable height)
- Footer section at bottom (3 lines)
- Proper visual hierarchy and borders

### `tui-basic`
Generic TUI validation:
- Has visible content (not blank)
- No rendering artifacts or broken characters
- Proper terminal dimensions

## Execution Flow

### 1. Capture Phase

Capture TUI output based on target type:

**For file targets:**
```bash
freeze {file_path} -o /tmp/tui-capture.{format}
```

**For command targets:**
```bash
freeze --execute "{command}" -o /tmp/tui-capture.{format}
```

**For tmux targets:**
```bash
tmux capture-pane -pet {session} | freeze -o /tmp/tui-capture.{format}
```

**For buffer targets:**
```bash
echo "{buffer}" | freeze -o /tmp/tui-capture.{format}
```

**Constraints:**
- You MUST verify freeze is installed before attempting capture
- You MUST handle capture failures gracefully and report the error
- You MUST use appropriate freeze flags for the output format
- You SHOULD use `--theme base16` for consistent rendering
- You SHOULD set reasonable dimensions with `--width` and `--height`

### 2. Extraction Phase

Extract content for LLM analysis:

**For text/semantic validation:**
- If format is `text`, use the captured text directly
- If format is `svg` or `png`, also capture text version for content analysis

**For visual validation:**
- Requires PNG format
- Will analyze the image directly using vision capabilities

**Constraints:**
- You MUST extract both visual and text representations when judge_mode is `visual`
- You MUST preserve ANSI escape sequences for color validation when relevant

### 3. Validation Phase

Apply LLM-as-judge with the appropriate criteria:

**Semantic Validation Prompt Template:**
```
Analyze this terminal UI output and determine if it meets the following criteria:

CRITERIA:
{criteria_description}

TERMINAL OUTPUT:
{captured_text}

Evaluate each criterion and provide:
1. PASS or FAIL for each requirement
2. Brief explanation for any failures
3. Overall verdict: PASS or FAIL

Be lenient on exact formatting but strict on:
- Required content presence
- Logical layout and hierarchy
- No rendering errors or artifacts
```

**Visual Validation Prompt Template (with image):**
```
Examine this terminal screenshot and validate:

CRITERIA:
{criteria_description}

Check for:
1. Visual hierarchy and layout
2. Color coding correctness
3. No rendering artifacts or broken characters
4. Proper alignment and spacing

Verdict: PASS or FAIL with explanation
```

**Constraints:**
- You MUST return a clear PASS or FAIL verdict
- You MUST provide specific feedback on failures
- You MUST be lenient on whitespace/formatting differences
- You MUST be strict on content presence and semantic correctness
- You SHOULD note any warnings even on PASS results

### 4. Reporting Phase

Report validation results:

**On PASS:**
```
✅ TUI Validation PASSED

Criteria: {criteria_name}
Target: {target}
Mode: {judge_mode}

All requirements satisfied.
{optional_notes}
```

**On FAIL:**
```
❌ TUI Validation FAILED

Criteria: {criteria_name}
Target: {target}
Mode: {judge_mode}

Issues found:
- {issue_1}
- {issue_2}

Screenshot saved: {path_if_saved}
```

**Constraints:**
- You MUST always provide a clear verdict
- You MUST list

More from this repository

code-assistSkill

Guides implementation of code tasks using test-driven development in an Explore, Plan, Code, Commit workflow. Acts as a Technical Implementation Partner and TDD Coach — following existing patterns, avoiding over-engineering, and producing idiomatic, modern code.

ralph-e2e-verifierSubagent

Use this agent when you need to run the Ralph orchestrator end-to-end test suite, analyze diagnostic outputs, and generate comprehensive reports of findings. This includes validating backend connectivity, orchestration loop behavior, event parsing, hat collections, memory systems, and error handling. Invoke this agent after making changes to core orchestration logic, before releases, or when debugging integration issues.\\n\\nExamples:\\n\\n<example>\\nContext: User has made changes to the event parsing logic and wants to verify nothing is broken.\\nuser: \"I just modified the event parsing in ralph-core, can you verify everything still works?\"\\nassistant: \"I'll use the ralph-e2e-verifier agent to run the full E2E test suite and analyze the results.\"\\n<Task tool invocation to launch ralph-e2e-verifier>\\n</example>\\n\\n<example>\\nContext: User is preparing a release and needs validation.\\nuser: \"We're preparing to release v0.5.0, please run the E2E tests\"\\nassistant: \"I'll launch the ralph-e2e-verifier agent to run comprehensive E2E tests across all backends and generate a release readiness report.\"\\n<Task tool invocation to launch ralph-e2e-verifier>\\n</example>\\n\\n<example>\\nContext: User notices orchestration issues and wants diagnostics analyzed.\\nuser: \"Ralph seems to be selecting the wrong hats, can you investigate?\"\\nassistant: \"I'll use the ralph-e2e-verifier agent to run E2E tests with diagnostics enabled and analyze the hat selection decisions.\"\\n<Task tool invocation to launch ralph-e2e-verifier>\\n</example>

ralph-loop-runnerSubagent

Use this agent when you need to execute a Ralph orchestration loop end-to-end and verify its completion. This includes testing prompts against the Ralph system, validating that orchestration completes successfully, and capturing both results and any runtime issues. Examples:\\n\\n<example>\\nContext: User wants to test if a prompt works correctly with Ralph orchestration.\\nuser: \"Test if Ralph can handle the prompt 'create a hello world function'\"\\nassistant: \"I'll use the ralph-loop-runner agent to execute this prompt through Ralph and verify completion.\"\\n<Task tool call to ralph-loop-runner agent>\\n</example>\\n\\n<example>\\nContext: User is debugging why a Ralph run failed.\\nuser: \"Run this spec through Ralph and tell me what went wrong\"\\nassistant: \"Let me use the ralph-loop-runner agent to execute this and capture any runtime problems.\"\\n<Task tool call to ralph-loop-runner agent>\\n</example>\\n\\n<example>\\nContext: User wants to validate Ralph behavior after code changes.\\nuser: \"I just modified the event parser, can you run a test loop?\"\\nassistant: \"I'll use the ralph-loop-runner agent to run a complete orchestration loop and verify the changes work correctly.\"\\n<Task tool call to ralph-loop-runner agent>\\n</example>

code-task-generatorSkill

Generates structured .code-task.md files from descriptions or PDD implementation plans. Auto-detects input type, creates properly formatted tasks with Given-When-Then acceptance criteria.

evaluate-presetsSkill

Use when testing Ralph's hat collection presets, validating preset configurations, or auditing the preset library for bugs and UX issues.

find-code-tasksSkill

Lists all code tasks in the repository with their status, dates, and metadata. Useful for getting an overview of pending work or finding specific tasks.

pddSkill

Transforms a rough idea into a detailed design document with implementation plan. Follows Prompt-Driven Development — iterative requirements clarification, research, design, and planning.

playwriterSkill

Browser automation via Playwriter (remorses) using persistent Chrome sessions and the full Playwright Page API.