Skip to main content
ClaudeWave
Skill3.5k repo starsupdated today

cursor-sdk-e2e-dev

The cursor-sdk-e2e-dev skill sets up a live local Omnigent server and runs end-to-end testing of the Cursor SDK harness, enabling developers to build cursor agents, execute real turns, and debug the cursor executor, authentication, model, and tool-bridge integration against actual code. Use this skill when developing, testing, or debugging the cursor harness components or their interactions with the Cursor Python SDK.

Install in Claude Code
Copy
git clone --depth 1 https://github.com/omnigent-ai/omnigent /tmp/cursor-sdk-e2e-dev && cp -r /tmp/cursor-sdk-e2e-dev/.claude/skills/cursor-sdk-e2e-dev ~/.claude/skills/cursor-sdk-e2e-dev
Then start a new Claude Code session; the skill loads automatically.

SKILL.md

# Cursor SDK harness: end-to-end dev & testing

The `cursor` harness drives the **Cursor Python SDK** (`cursor_sdk`, an
`AsyncAgent` over a local bridge) and bridges Omnigent's `sys_*` tools into
Cursor as SDK `custom_tools`. This skill is the proven recipe for running it
**for real** against a live local server — not just the unit tests.

> The harness runs as a **local runner** from your current checkout, so
> `omni run <bundle> --server <url>` exercises exactly the code you're on.

## Prerequisites (check these first)

1. **You're on the branch you want to test.** The cursor harness merged to
   `main` (#203/#204). Test on `main` unless validating a specific branch.
2. **A Cursor API key is configured.** The SDK *requires* an API key
   (`crsr_…`); there is no `cursor-agent login` path. Verify (booleans only —
   never print the key):
   ```bash
   .venv/bin/python -c "from omnigent.onboarding.cursor_auth import cursor_api_key_configured; import os; print('config:', cursor_api_key_configured(), 'env:', bool(os.environ.get('CURSOR_API_KEY')))"
   ```
   If both are `False`, run `omni setup` and register a Cursor key, or
   `export CURSOR_API_KEY=crsr_…`.
3. **`cursor-sdk` is installed** (a baseline dependency):
   `.venv/bin/python -c "import cursor_sdk; print(cursor_sdk.__file__)"`.
4. **Network egress to Cursor's backend.** The bridge subprocess talks to
   Cursor's own API; a turn that hangs or fails to connect on a locked-down
   host is usually an egress problem, not a harness bug.

## Step 1 — start a local server

```bash
cd /path/to/omnigent
.venv/bin/omni server start          # spawns a detached server on a free loopback port
.venv/bin/omni server status         # prints the URL, e.g. http://127.0.0.1:6767
```

Use the **printed URL** below as `$SERVER`. (You can also run a foreground
server on a fixed port with `omnigent server --port 7777 --no-open`.)

## Step 2 — build a cursor agent bundle

A spec with `spec_version` **must be a directory containing `config.yaml`** —
not a single `.yaml` file. Minimal cursor agent:

```bash
mkdir -p /tmp/cursor-dev
cat > /tmp/cursor-dev/config.yaml <<'YAML'
spec_version: 1
name: cursor-dev
description: Cursor SDK dev/test agent.
executor:
  type: omnigent
  config:
    harness: cursor
    # model: gpt-5            # optional; omit for cursor "auto"
prompt: |
  You are a terse test agent. Answer in as few words as possible.
YAML
```

For sub-agents, tools, guardrails/policies, copy the field shapes from
`examples/polly/config.yaml` and `examples/debby/config.yaml`.

## Step 3 — run a turn (and smoke-test)

```bash
SERVER=http://127.0.0.1:6767   # the URL from `omni server status`
timeout 280 .venv/bin/omni run /tmp/cursor-dev \
  -p "Reply with exactly the single word: PONG" \
  --server "$SERVER" 2>&1
```

A healthy run prints connection lines then the assistant reply (`PONG`). If
that works, the full stack is good: key, egress, bridge, harness.

- **Shell / file tools:** add `--tools coding`.
- **Specific model:** add `--model gpt-5` (or `composer-1`, `auto`,
  `databricks-claude-opus-4-8`, …).

## Targeted scenarios

| Goal | How |
|------|-----|
| Native tools (shell/edit/read) | `--tools coding`, prompt to create→read→edit a file and run a shell command; confirm it actually touches disk |
| Bridged `sys_*` / sub-agent dispatch | declare a sub-agent (`tools.agents`/`spawn`), prompt the cursor agent to delegate — exercises the `custom_tools` daemon-thread bridge (`run_coroutine_threadsafe`) |
| Model routing | run the same bundle with several `--model` values; note which actually runs |
| Policy / guardrail | add a guardrail that denies a keyword; confirm `PHASE_LLM_REQUEST`/`PHASE_LLM_RESPONSE` blocks it |
| Concurrency / leaks | fire several `omni run … &` at once; then `pgrep -af "cursor-sdk-bridge|cursor_sdk"` to check for orphaned bridge subprocesses |

## Gotchas (these cost real time)

1. **`config.yaml`'s `server:` defaults to a *remote* server** (e.g. a
   Databricks Apps URL). Omitting `--server` sends your turn to that remote
   deploy — which may be **stale** and reject the cursor harness with
   `executor.config.harness: must be one of […], got 'cursor'`. **Always pass
   `--server http://127.0.0.1:<port>`** for local testing. (That allowlist is
   `omnigent/spec/_omnigent_compat.py`; if a *local* server rejects `cursor`,
   it's running stale code — restart it from your checkout.)
2. **A spec with `spec_version` must be a directory + `config.yaml`**, never a
   single `.yaml` file.
3. **Cursor needs a `crsr_` API key** (no CLI login). Resolution precedence:
   spec `executor.auth` (api_key) > stored `cursor:` config block (`omni
   setup`) > ambient `CURSOR_API_KEY`.
4. **No Databricks gateway.** Cursor talks only to Cursor's backend, so a
   `databricks-*` model is silently resolved to cursor `auto` — it will *not*
   route through the AI Gateway like claude-sdk/codex/pi.
5. **Use a model id from the account's catalog.** Bare `gpt-5` is **not** valid;
   the SDK rejects unknown ids. Valid examples seen live: `default`,
   `composer-2.5`, `claude-opus-4-8`, `gpt-5.5`. Run with `--model` and read the
   SDK's `Available models:` list to discover the live set.
5. **Turns take 30–90s** — always wrap in `timeout 280`.
6. **Local-runner topology:** `omni run <bundle> --server <url>` runs the
   harness from your **current checkout**; the server only holds state. The
   managed `omni server start` server runs from whatever venv launched it.
7. **Never print/echo the Cursor key** in logs or commands.

## Code & tests

- **Executor (SDK bridge):** `omnigent/inner/cursor_executor.py`
- **Wrap (HARNESS_CURSOR_* env → executor):** `omnigent/inner/cursor_harness.py`
- **Auth / key resolution:** `omnigent/onboarding/cursor_auth.py`
- **Spawn env:** `_build_cursor_spawn_env` in `omnigent/runtime/workflow.py`

```bash
# Unit tests (use --frozen; the cwsandbox extra is unsatisfiable on public PyPI here)
uv run --frozen --extra dev python
antigravity-sdk-e2e-devSkill

Spin up a live local Omnigent server and exercise the Antigravity (Gemini) SDK harness end-to-end — build antigravity agents, run real turns, smoke-test, and bug-bash. Load when developing, testing, or debugging the antigravity harness (omnigent/inner/antigravity_executor.py, antigravity_harness.py, omnigent/onboarding/antigravity_auth.py) or its auth / model / tool-bridge behavior.

deploy-docker-composeSkill

Run the Omnigent server as a Docker compose stack (server + Postgres) on any Docker host — your laptop, a VPS, EC2 by hand, or as the base layer of any container-platform deploy. Invoke when the user wants to build the image, bring up the compose stack, debug the stack on a host they already have, or extend the stack for a new platform.

debateSkill

Have the Claude and GPT partners critique each other's answers across a configurable number of rounds (default 1) before converging on a synthesis. Use when the user wants the two perspectives stress-tested against each other, not just shown side by side.

cross-reviewSkill

Verify an implementer's diff with an INDEPENDENT, different-vendor sub-agent (diff plus contract only); turn blocking issues into fix-tasks and loop until clean.

fanoutSkill

Run independent subtasks in parallel — one git worktree and one implementation sub-agent per task, each opening its own PR — then cross-review every PR. polly never merges; the human does.

investigateSkill

Delegate read-only investigation, debugging, audit, search, or code-understanding tasks to sub-agents; synthesize only from their structured reports.

build-omnigentSkill

Patterns and templates for generating valid Omnigent agent directories. Load when ready to create files.

detect-frameworkSkill

Detect Python agent frameworks from code imports and map them to Omnigent executor types. Load when the user has existing agent code to integrate.