Skip to main content
ClaudeWave
Skill389 repo starsupdated today

agy-headless-evidence

# agy-headless-evidence This Claude Code skill enables non-interactive Antigravity (AGY) runs, either one-shot commands via `agy -p` or persistent sessions through the agentapi sidecar, while capturing a JSONL evidence artifact containing the event stream, final message, exit code, and exact command for later validation. Use it when headless AGY execution is part of an automated workflow and downstream validators must verify the run's authenticity and outcome.

Install in Claude Code
Copy
git clone --depth 1 https://github.com/boshu2/agentops /tmp/agy-headless-evidence && cp -r /tmp/agy-headless-evidence/images/gemini/skills/agy-headless-evidence ~/.claude/skills/agy-headless-evidence
Then start a new Claude Code session; the skill loads automatically.

SKILL.md

# agy-headless-evidence

Run **Antigravity (AGY)** headlessly — a one-shot `agy -p` or a scheduled tick driven through the
**agentapi sidecar** — and leave a **JSONL proof surface** a validator can read back after the
session ends: the event stream, the final message, the exit code, and the exact command. Scope in,
evidence out.

## Overview / When to Use

The factory dispatches AGY workers non-interactively. A worker that only prints prose to a terminal
leaves nothing a validator can verify later (per the cross-agent rule: you read a worker's
*published compression*, never its live session). This skill makes every headless AGY run produce a
durable, inspectable artifact.

AGY's headless surface (distinct from gemini-cli — there is no `gemini -p`, no `gemini mcp`):

- **One-shot:** `agy -p "<prompt>"` / `agy --print` runs the prompt, prints, and exits
  (`--print-timeout` default 5m). `-c`/`--continue` resumes the latest conversation;
  `--conversation <id>` resumes by id.
- **Sidecar:** the **agentapi sidecar** is AGY's long-lived headless server. A scheduled tick (or an
  external timer) drives runs through the sidecar instead of cold-starting `agy` each time — useful
  for recurring loops where you want a persistent brain and warm conversation state.
- **Scope:** `--add-dir <dir>` (repeatable) bounds which repos a run can touch; pair with a scoped
  git worktree so concurrent roles never share a working tree.
- **Brain:** durable memory + userFacing artifacts under `~/.gemini/antigravity-cli/{brain,knowledge}/`
  — the canonical place to mirror a run's verdict so a different context can consume it.

Use it whenever a headless AGY run is part of an automated loop and someone (or something) downstream
must trust the result.

## ⚠️ Critical Constraints

- **Rule 1 — Capture the exit code immediately.** `echo "$?" > exit-code` on the line right after
  `agy -p` returns. **Why:** a plausible final answer with a non-zero exit is still a failed run;
  validators must key off process reality, not self-report (evidence over comfort).
- **Rule 2 — One run, one timestamped directory.** Never append unrelated runs into the same
  evidence files. **Why:** a verdict must bind to exactly one event stream and one command; runs
  otherwise clobber each other.
- **Rule 3 — Pick the role's posture before launching.** A read-mostly validator gets no
  `--dangerously-skip-permissions`; a scoped author gets `--add-dir` to exactly its worktree.
  **Why:** the permission + scope flags are the runtime boundary, not a convenience.
- **Rule 4 — `dcg` guard stays on.** `~/.gemini/settings.json` wires a `BeforeTool` hook on
  `run_shell_command` to `dcg`; keep it even under `--dangerously-skip-permissions`. **Why:** the
  auto-approve flag would otherwise let a destructive command through — `dcg` is the floor.
- **Rule 5 — JSONL is the source of truth over the pretty stream.** **Why:** the human-readable
  output is for eyeballing; the validator parses the captured stream — what isn't captured didn't
  happen as far as the proof surface is concerned.
- **Rule 6 — Record the exact command.** Store argv, cwd, model, scopes (`--add-dir`), and whether
  it ran one-shot or via the sidecar. **Why:** a run that cannot be reproduced is weak evidence.
- **Rule 7 — This is the AGY lane only (LAW 0).** Never reach for `claude -p` / `claude --print` to
  "do the same for Claude." **Why:** `claude -p` bills the API per-token, not the Max sub, and is
  banned for worker dispatch; Claude workers go through NTM panes / spawned subagents. AGY runtime is
  `agy` / `agy -p`.

## Workflow / Methodology

### Phase 1: Declare the role and posture
Decide whether the run is an author, validator, researcher, or tie-breaker, and pick the scope +
permission posture from that role before launching.

| Role | Scope | Permission posture |
|---|---|---|
| Author (edits) | `--add-dir` to one worktree | `--dangerously-skip-permissions` (dcg still on) |
| Validator (read-mostly) | `--add-dir` to the repo, no writes | no skip-permissions; edits forbidden |
| Researcher | `--add-dir` read context | no skip-permissions |
| Externally-sandboxed batch worker | scoped worktree | skip-permissions only by explicit policy |

**Checkpoint:** confirm the role, its `--add-dir` scope, and that you are NOT granting an author's
posture to a validator before launching.

### Phase 2: Build the evidence directory + command record
```bash
RUN_DIR="$(pwd)/.agy-evidence/$(date -u +%Y%m%dT%H%M%SZ)-${ROLE:-run}"
mkdir -p "$RUN_DIR"
{
  printf 'cwd=%s\n' "$PWD"
  printf 'mode=%s\n' "${AGY_MODE:-oneshot}"   # oneshot | sidecar
  printf 'scopes=%s\n' "$REPO"
  printf 'cmd=%s\n' 'agy -p <prompt> --add-dir <repo> --print-timeout 600'
} > "$RUN_DIR/command.txt"
```
Add `model.txt` if a non-default `--model` is used; add `scope.txt` for edit runs.

**Checkpoint:** `command.txt` exists and records cwd, mode (oneshot vs sidecar), and scope.

### Phase 3: Run AGY headless, capture everything

One-shot author run (scoped worktree, dcg on):
```bash
agy -p "Claim one ready bead via br. Implement only it in this worktree. \
  Commit scoped. Write evidence to brain as userFacing. Do NOT close it — a judge will." \
  --add-dir "$REPO" --dangerously-skip-permissions --print-timeout 600 \
  > "$RUN_DIR/events.jsonl" 2> "$RUN_DIR/stderr.log"
echo "$?" > "$RUN_DIR/exit-code"
```

Sidecar / scheduled-tick run (persistent server; resume warm state by id):
```bash
agy -p "Validate bead <id> against its evidence artifact ONLY. You did not author it. \
  Emit VERDICT: PASS|WARN|FAIL to brain as a userFacing verdict. Do not edit code." \
  --conversation "$CONV_ID" --add-dir "$REPO" --print-timeout 600 \
  > "$RUN_DIR/events.jsonl" 2> "$RUN_DIR/stderr.log"
echo "$?" > "$RUN_DIR/exit-code"
```
Do not broaden `--add-dir` without recording why in `scope.txt`.

**Checkpoint:** `exit-code` is written and `events.jsonl` is non-empty before declaring the run done.

### Phase 4: Mirror the verdict to