running-tests
Running Tests skill executes test discovery and execution for the opencode-swarm project while preventing session timeouts. Use this skill when you need to find and run tests related to a single changed source file across the codebase. The skill enforces a strict one-file limit and caps test discovery at fifty files to avoid cascading scope expansion that kills sessions. For multiple files or targeted single-file validation, use bun shell commands instead.
git clone --depth 1 https://github.com/zaxbysauce/opencode-swarm /tmp/running-tests && cp -r /tmp/running-tests/.opencode/skills/running-tests ~/.claude/skills/running-testsSKILL.md
# Running Tests for opencode-swarm
This skill is about **executing** tests safely. For **writing** tests, see `writing-tests`.
---
## ⛔ The One Rule That Prevents Session Kills
**Never use `test_runner` with more than one source file for any discovery scope.**
`graph` and `impact` each fan out per file through the import tree; `convention` maps
each source file to a test file by name convention. The union quickly exceeds
`MAX_SAFE_TEST_FILES = 50`, triggering `scope_exceeded`, which causes LLMs to
cascade to `scope: 'all'` and kill the session. All three scopes now reject with
`scope_exceeded` before fan-out when `sourceFiles.length > MAX_SAFE_SOURCE_FILES = 1`.
---
## Three-Layer Defense Against Session Blocking
test_runner has three pre-resolution guards that prevent unbounded fan-out from blocking the session:
### Layer 1 — Source-file count guard (synchronous, fires before any I/O)
`sourceFiles.length > MAX_SAFE_SOURCE_FILES (1)` → returns `scope_exceeded` immediately. Catches the common case of multi-file calls before any filesystem access.
### Layer 2 — Pre-resolution fan-out estimate (fast, ~100ms)
`estimateFanOut(sourceFiles, workingDir)` reads the cached impact map and counts unique test files without spawning subprocesses. If the estimate exceeds `MAX_SAFE_TEST_FILES = 50`, the call returns `scope_exceeded` immediately — before any graph traversal begins. Only fires when `sourceFiles.length === 1` (Layer 1 has already passed).
### Layer 3 — Budget-limited traversal + post-resolution length check
`analyzeImpact` accepts a `budget` parameter (`MAX_SAFE_TEST_FILES = 50`). The traversal stops as soon as it has visited 50 test files and sets `budgetExceeded: true`. The call site checks this flag and returns `scope_exceeded` before processing results.
After graph resolution, the final `testFiles.length` is additionally compared to `MAX_SAFE_TEST_FILES`. If exceeded, `scope_exceeded` is returned.
**Result:** When fan-out exceeds the safe threshold, the session gets `outcome: 'scope_exceeded'` instead of hanging.
---
## Decision Tree: test_runner tool vs bun shell command
```
Do you need to run tests?
│
├─ Single test file, targeted validation
│ └─ Either works. Prefer shell: bun --smol test <file> --timeout 30000
│
├─ Multiple files in the same directory (e.g. all agents tests)
│ └─ Shell only — per-file loop. Never test_runner with multiple files.
│
├─ Find tests related to ONE changed source file
│ └─ test_runner is fine: { scope: 'graph', files: ['src/agents/coder.ts'] }
│ (single file → bounded fan-out)
│
├─ Find tests related to MULTIPLE changed source files
│ └─ Shell only — per-file loop over the changed files, or run the whole directory.
│ test_runner with any discovery scope + multiple source files = scope_exceeded
│ (guard fires before fan-out for convention, graph, and impact scopes).
│
└─ Validate the entire repo (pre-push)
└─ Shell only — 5-tier suite from commit-pr skill. Never test_runner scope:'all'.
```
---
## Scope Safety Reference
| Scope | With `files: [one]` | With `files: [many]` | Notes |
|-------|--------------------|--------------------|-------|
| `'convention'` | ✅ Safe | ❌ Rejected (`scope_exceeded`) | Guard fires before fan-out; direct test file paths exempt |
| `'graph'` | ✅ Safe (capped at 50 via budget) | ❌ Rejected (`scope_exceeded`) | Two-layer guard: source-file count + fan-out estimate |
| `'impact'` | ✅ Safe (capped at 50 via budget) | ❌ Rejected (`scope_exceeded`) | Two-layer guard: source-file count + fan-out estimate |
| `'all'` | ❌ Never | ❌ Never | Requires `allow_full_suite: true`; CI mirror only |
| `'all'` | ❌ Never | ❌ Never | Requires `allow_full_suite: true`; CI mirror only |
**Rule of thumb:** Pass exactly one source file to `test_runner`. For multiple files, use a shell loop.
---
## Per-File Isolation Loops
CI runs agents/tools/services in per-file isolation (one `bun --smol` process per file).
Reproduce this locally with the following loops.
### bash (Linux / macOS)
```bash
# Single directory — per-file isolation
for f in tests/unit/agents/*.test.ts; do
bun --smol test "$f" --timeout 30000
done
# Multiple directories
for dir in tests/unit/tools tests/unit/services tests/unit/agents; do
for f in "$dir"/*.test.ts; do
bun --smol test "$f" --timeout 30000
done
done
# Stop on first failure (useful for debugging)
for f in tests/unit/agents/*.test.ts; do
bun --smol test "$f" --timeout 30000 || { echo "FAILED: $f"; break; }
done
```
### PowerShell (Windows)
```powershell
# Single directory — per-file isolation
Get-ChildItem tests/unit/agents/*.test.ts | ForEach-Object {
bun --smol test $_.FullName --timeout 30000
}
# Multiple directories
@('tests/unit/tools', 'tests/unit/services', 'tests/unit/agents') | ForEach-Object {
Get-ChildItem "$_/*.test.ts" | ForEach-Object {
bun --smol test $_.FullName --timeout 30000
}
}
# Capture output (avoids truncation on large output)
Get-ChildItem tests/unit/agents/*.test.ts | ForEach-Object {
bun --smol test $_.FullName --timeout 30000
} | Out-File "$env:TEMP\test_out.txt"
Get-Content "$env:TEMP\test_out.txt" | Select-Object -Last 50
```
**Common PowerShell pitfalls:**
- `for f in ...; do` — invalid, use `Get-ChildItem | ForEach-Object`
- `Select-String -Last N` — invalid parameter, use `Select-Object -Last N`
- `2>&1 2>&1` — duplicate redirection, causes parse error; use `2>&1` once
- `&&` — not supported in PowerShell 5.1; use `; if ($?) { cmd2 }` instead
- After `bun install --frozen-lockfile --force`, non-elevated Windows shells can hit `EPERM` while reading refreshed `node_modules` entries. Treat that as a host permission/access issue: rerun the same focused Bun command with approved/elevated access before diagnosing it as a code or test failure.
---
## Batch vs Per-File: Which Directories Need Isolation?
| Directory | Mode | Reason |
|-----------|------|--------|
| `tests/unit/tools/` | Per-file loop | Heavy `mock.modu>
Run a rigorous, quote-grounded codebase review or security/QA/accessibility/performance/AI-slop/enhancement audit. Use for full-repo or large-subsystem review reports; not for normal implementation. Performs Phase 0 inventory, selected exhaustive tracks with non-diluting depth, coverage closure, reviewer/critic validation, and writes .swarm/review-v8 artifacts without modifying source files.
>
>
Use when asked to trace, investigate, root-cause, plan, fix, close, or prepare a PR for a GitHub issue or bug report. Runs an evidence-first issue workflow: GitHub intake, reproduction, reasoning-guided localization, no-gap fix planning, independent critic review, user approval gate, implementation, tests, and PR-ready closure.
>
>