Skip to main content
ClaudeWave
Skill10.5k estrellas del repoactualizado 14d ago

test-reporting

The test-reporting skill executes the Level 2 dummy agent integration test suite from the aden-hive/hive repository and generates a detailed HTML report. It collects test results from JUnit XML output and stdout, then produces a timestamped report with per-test input-to-outcome analysis, including component names, test statuses, durations, and descriptions of what each test verifies and how it validates functionality.

Instalar en Claude Code
Copiar
git clone --depth 1 https://github.com/aden-hive/hive /tmp/test-reporting && cp -r /tmp/test-reporting/.claude/skills/test-reporting ~/.claude/skills/test-reporting
Después abre una sesión nueva de Claude Code; el skill carga automáticamente.

SKILL.md

# Integration Test Reporting Skill

Run the Level 2 dummy agent integration test suite and produce a detailed HTML report with per-test input → outcome analysis.

## Trigger

User wants to run integration tests and see results:
- `/test-reporting`
- `/test-reporting test_component_queen_live.py`
- `/test-reporting --all`

## SOP: Running Tests

### Step 1: Select Scope

If the user provides a specific test file or pattern, use it. Otherwise run the full suite.

```bash
# Full suite
cd core && echo "1" | uv run python tests/dummy_agents/run_all.py --interactive 2>&1

# Specific file (requires manual provider setup)
cd core && uv run python -c "
import sys
sys.path.insert(0, '.')
from tests.dummy_agents.run_all import detect_available
from tests.dummy_agents.conftest import set_llm_selection

avail = detect_available()
claude = [p for p in avail if 'Claude Code' in p['name']]
if not claude:
    avail_names = [p['name'] for p in avail]
    raise RuntimeError(f'No Claude Code subscription. Available: {avail_names}')
provider = claude[0]
set_llm_selection(
    model=provider['model'],
    api_key=provider['api_key'],
    extra_headers=provider.get('extra_headers'),
    api_base=provider.get('api_base'),
)

import pytest
sys.exit(pytest.main([
    'tests/dummy_agents/TEST_FILE_HERE',
    '-v', '--override-ini=asyncio_mode=auto', '--no-header', '--tb=long',
    '--log-cli-level=WARNING', '--junitxml=/tmp/hive_test_results.xml',
]))
"
```

### Step 2: Collect Results

After the test run completes, collect:
1. **JUnit XML** from `--junitxml` output (if available)
2. **stdout/stderr** from the run
3. **Summary table** from `run_all.py` output (the Unicode table)

### Step 3: Generate HTML Report

Write the report to `/tmp/hive_integration_test_report.html`.

The report MUST include these sections:

#### Header
- Run timestamp (ISO 8601)
- Provider used (model name, source)
- Total tests / passed / failed / skipped
- Total wall-clock time
- Overall verdict: PASS (all green) or FAIL (with count)

#### Per-Test Table

For EVERY test (not just failures), include a row with:

| Column | Description |
|--------|-------------|
| Component | Test file grouping (e.g., `component_queen_live`) |
| Test Name | Function name (e.g., `test_queen_starts_in_planning_without_worker`) |
| Status | PASS / FAIL / SKIP / ERROR with color badge |
| Duration | Wall-clock seconds |
| What | One-line description of what the test verifies |
| How | How it works (setup → action → assertion) |
| Why | Why this test matters (what bug/behavior it catches) |
| Input | The input data or configuration (graph spec, initial prompt, phase, etc.) |
| Expected Outcome | What the test asserts |
| Actual Outcome | What actually happened (PASS: matches expected / FAIL: actual vs expected) |
| Failure Detail | For failures only: full traceback + diagnosis |

#### What / How / Why Descriptions

These MUST be derived from the test function's docstring and code. Read each test file to extract:
- **What**: From the docstring first line
- **How**: From the test body (what fixtures, what graph, what assertions)
- **Why**: From the docstring body or "Why this matters" section in the test module

Use these mappings for the component test files:

```
test_component_llm.py          → "LLM Provider" — streaming, tool calling, tokens
test_component_tools.py        → "Tool Registry + MCP" — connection, execution
test_component_event_loop.py   → "EventLoopNode" — iteration, output, stall
test_component_edges.py        → "Edge Evaluation" — conditional, priority
test_component_conversation.py → "Conversation Persistence" — storage, cursor
test_component_escalation.py   → "Escalation Flow" — worker→queen signaling
test_component_continuous.py   → "Continuous Mode" — conversation threading
test_component_queen.py        → "Queen Phase (Unit)" — phase state, tools, events
test_component_queen_live.py   → "Queen Phase (Live)" — real queen, real LLM
test_component_queen_state_machine.py → "Queen State Machine" — edge cases, races
test_component_worker_comms.py → "Worker Communication" — events, data flow
test_component_strict_outcomes.py → "Strict Outcomes" — exact path, output, quality
```

#### HTML Template

Use this structure:

```html
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="utf-8">
<title>Hive Integration Test Report — {timestamp}</title>
<style>
  :root { --pass: #22c55e; --fail: #ef4444; --skip: #f59e0b; --bg: #0f172a; --surface: #1e293b; --text: #e2e8f0; --muted: #94a3b8; --border: #334155; }
  * { box-sizing: border-box; margin: 0; padding: 0; }
  body { font-family: 'SF Mono', 'Fira Code', monospace; background: var(--bg); color: var(--text); padding: 2rem; line-height: 1.6; }
  h1, h2, h3 { font-weight: 600; }
  h1 { font-size: 1.5rem; margin-bottom: 1rem; }
  h2 { font-size: 1.2rem; margin: 2rem 0 1rem; border-bottom: 1px solid var(--border); padding-bottom: 0.5rem; }
  .summary { display: grid; grid-template-columns: repeat(auto-fit, minmax(150px, 1fr)); gap: 1rem; margin-bottom: 2rem; }
  .card { background: var(--surface); padding: 1rem; border-radius: 8px; border: 1px solid var(--border); }
  .card .label { color: var(--muted); font-size: 0.75rem; text-transform: uppercase; }
  .card .value { font-size: 1.5rem; font-weight: 700; margin-top: 0.25rem; }
  .card .value.pass { color: var(--pass); }
  .card .value.fail { color: var(--fail); }
  table { width: 100%; border-collapse: collapse; font-size: 0.8rem; }
  th { background: var(--surface); position: sticky; top: 0; text-align: left; padding: 0.5rem; border-bottom: 2px solid var(--border); color: var(--muted); text-transform: uppercase; font-size: 0.7rem; }
  td { padding: 0.5rem; border-bottom: 1px solid var(--border); vertical-align: top; }
  tr:hover { background: rgba(255,255,255,0.03); }
  .badge { display: inline-block; padding: 2px 8px; border-radius: 4px; font-size: 0.7rem; font-weight: 700; }
  .badge.pass { background: rgba(34,197,94,0.2); color: var(--pass); }
  .bad