swarm-local-e2e
This skill provides a structured guide for running end-to-end tests of the agent swarm system locally using a real API server and Docker containers. Use it when you need to verify features through complete workflow testing, or proactively after implementing changes to the API, task lifecycle, session logging, Docker configuration, or UI components. The skill walks through prerequisites, port configuration, database cleanup, API server startup, Docker image building, container orchestration, task creation, log verification, dashboard access, and cleanup procedures.
git clone --depth 1 https://github.com/desplega-ai/agent-swarm /tmp/swarm-local-e2e && cp -r /tmp/swarm-local-e2e/.claude/skills/swarm-local-e2e ~/.claude/skills/swarm-local-e2eSKILL.md
# Local E2E Testing Guide
Run full end-to-end tests of the agent swarm locally with a real API server and Docker containers.
## When to Use This Skill
This skill should be invoked in two modes:
1. **User-requested QA**: The user asks you to run E2E tests, verify a feature, or QA a specific flow. Follow the steps below targeting what they asked for.
2. **Automated change verification**: After implementing changes that touch the API, runner, polling, task lifecycle, session logs, Docker entrypoint, or worker/lead behavior — use this skill proactively to verify the changes work end-to-end. Determine what's testable based on the diff:
- **Task lifecycle changes** (poll, runner, store-progress): Create assigned + pool tasks, verify they complete and have correct logs
- **Session log changes**: Run two sequential tasks on the same agent, verify log isolation (unique sessionIds, no cross-contamination)
- **Docker / entrypoint changes**: Build image, start containers, verify boot logs and registration
- **UI changes**: Start the dashboard, use agent-browser/qa-use to verify rendering
- **API endpoint changes**: Call the endpoint directly and verify the response
You do not need to run every step — pick the subset relevant to the changes being tested.
## Prerequisites
- OrbStack or Docker Desktop running (`open -a OrbStack` if needed)
- `.env` with `API_KEY` and `PORT` configured
- `.env.docker-lead` with lead config (`AGENT_ID`, `CLAUDE_CODE_OAUTH_TOKEN`, `MCP_BASE_URL`)
- `.env.docker` with worker config (`AGENT_ID`, `CLAUDE_CODE_OAUTH_TOKEN` or `OPENROUTER_API_KEY`, `MCP_BASE_URL`)
## Step 1: Determine Your Port
Check `.env` for the configured port — do **not** assume 3013:
```bash
grep ^PORT= .env
```
Use this value as `$PORT` throughout. In worktrees, each worktree may have a different port. Always verify and use the value from `.env`.
Also verify the Docker env files match:
```bash
grep MCP_BASE_URL .env.docker-lead .env.docker
# Both should point to http://host.docker.internal:$PORT
```
If they don't match, update them before starting containers.
## Step 2: Clean DB + Start API Server
```bash
# Kill any existing API process on your port
lsof -ti :$PORT | xargs kill 2>/dev/null
# Clean DB for fresh state
rm -f agent-swarm-db.sqlite agent-swarm-db.sqlite-wal agent-swarm-db.sqlite-shm
# Start API server
bun run start:http &
# Wait ~3s for startup, confirm "MCP HTTP server running on http://localhost:$PORT/mcp"
```
## Step 3: Build Docker Image
```bash
bun run docker:build:worker
```
This builds `agent-swarm-worker:latest` from the current code. **Rebuild after every code change.**
## Step 4: Start Lead Container
Use a **unique container name** to avoid conflicts with other worktrees (e.g. include branch name or feature):
```bash
docker run --rm -d \
--name e2e-lead-$(git branch --show-current | tr '/' '-') \
--env-file .env.docker-lead \
-e AGENT_ROLE=lead \
-e MAX_CONCURRENT_TASKS=1 \
-p 3201:3000 \
agent-swarm-worker:latest
```
Wait ~15s, then verify:
```bash
docker logs e2e-lead-$(git branch --show-current | tr '/' '-') 2>&1 | tail -5
# Should see: "[lead] Polling for triggers (0/1 active)..."
```
If port 3201 is taken by another worktree, pick a different host port (e.g. `-p 3211:3000`).
## Step 5: Start Worker Container
```bash
docker run --rm -d \
--name e2e-worker-$(git branch --show-current | tr '/' '-') \
--env-file .env.docker \
-e MAX_CONCURRENT_TASKS=1 \
-p 3203:3000 \
agent-swarm-worker:latest
```
Wait ~15s, then verify:
```bash
docker logs e2e-worker-$(git branch --show-current | tr '/' '-') 2>&1 | tail -5
# Should see: "[worker] Polling for triggers (0/1 active)..."
```
## Step 6: Verify Registration
Use `context-mode execute` (not curl directly due to hook restrictions):
```javascript
const headers = { 'Authorization': 'Bearer $API_KEY', 'Content-Type': 'application/json' };
const agents = await (await fetch('http://localhost:$PORT/api/agents', { headers })).json();
for (const a of agents.agents) {
console.log(`${a.name} | isLead: ${a.isLead} | status: ${a.status} | id: ${a.id}`);
}
```
Should show both lead and worker registered as `idle`. Save the agent IDs for task creation.
## Step 7: Create Tasks
### Assigned task (picked up by lead)
```javascript
const t = await (await fetch('http://localhost:$PORT/api/tasks', {
method: 'POST', headers,
body: JSON.stringify({ task: 'Say hello. Call store-progress with status completed.', agentId: LEAD_ID })
})).json();
console.log('Task:', t.id, '| status:', t.status);
```
**Important**: Use `agentId` (not `assignedTo`) to assign tasks. Wrong param silently creates an unassigned task.
### Pool task (auto-claimed by worker)
```javascript
const t = await (await fetch('http://localhost:$PORT/api/tasks', {
method: 'POST', headers,
body: JSON.stringify({ task: 'Say hello. Call store-progress with status completed.' })
})).json();
console.log('Pool task:', t.id, '| status:', t.status);
```
Workers auto-claim unassigned tasks at poll time. Leads do **not** auto-claim pool tasks.
## Step 8: Monitor Progress
```bash
# Watch lead logs (use your container name)
docker logs -f e2e-lead-$(git branch --show-current | tr '/' '-') 2>&1 | tail -20
# Watch worker logs
docker logs -f e2e-worker-$(git branch --show-current | tr '/' '-') 2>&1 | tail -20
```
Poll task status:
```javascript
const t = await (await fetch('http://localhost:$PORT/api/tasks/<task-id>', { headers })).json();
console.log(t.status); // pending → in_progress → completed/failed
```
## Step 9: Verify Session Logs
```javascript
const logs = await (await fetch('http://localhost:$PORT/api/tasks/<task-id>/session-logs', { headers })).json();
console.log('Log count:', logs.logs.length);
// Should be > 0 for completed tasks
```
For **log isolation** verification (multiple sequential tasks from same agent):
```javascript
const [l1, l2] = await Promise.all([
fetch('http://localhost:$PORT/api/tasks/<tCode search agent for exploring any codebase. Use for finding code by intent, locating implementations, understanding how something works, or discovering related code. Prefer over Grep/Glob/Read for any semantic or exploratory question.
Close a GitHub or GitLab issue with a summary comment
Create a pull request (GitHub) or merge request (GitLab) from the current branch
Implement a GitHub issue or GitLab issue and create a PR/MR
Investigate and triage a Sentry error issue
Respond to a GitHub issue/PR or GitLab issue/MR
Review a task that has been offered to you and decide whether to accept or reject it
Review a pull request (GitHub) or merge request (GitLab) and provide detailed feedback