Skill142 repo starsupdated 7d ago

gui

Cross-driver GUI actuation for CAE solvers running under sim-cli. Use to click buttons, fill fields, dismiss dialogs, and capture window screenshots against GUI-capable driver windows through `sim exec`.

View source Repository: sim-cli

Install in Claude Code

Copy

git clone --depth 1 https://github.com/svd-ai-lab/sim-cli /tmp/gui && cp -r /tmp/gui/src/sim/_skills/sim-cli/gui ~/.claude/skills/gui

Then start a new Claude Code session; the skill loads automatically.

Definition

SKILL.md

# `gui` — cross-driver GUI actuation

Whenever the active driver runs with `ui_mode=gui` (or `desktop`),
`sim serve` injects a **`gui`** object into your `sim exec` namespace
alongside `session` / `solver` / `meshing` / `model`. The object is the
same shape across solvers — only the process filter differs — so one
skill serves every GUI-capable driver.

`/connect` advertises it:

```json
{
  "ok": true,
  "data": {
    "...": "...",
    "tools": ["gui"],
    "tool_refs": {"gui": "sim-cli/_skills/sim-cli/gui/SKILL.md"}
  }
}
```

If `tools` doesn't contain `"gui"`, the driver launched headless and
the object is absent — don't call it.

## When to reach for `gui`

Three scenarios dominate:

1. **A blocking dialog is wedging the workflow.** A login prompt,
   overwrite confirmation, or script-error dialog can pause agent work
   until someone clicks a button — that someone is you, via `gui`.
2. **You need to drive the GUI where the SDK can't.** Some workflows
   expose a UI-only surface that the driver API does not cover.
3. **You need a per-window screenshot.** `sim screenshot` captures the
   whole desktop. `SimWindow.screenshot()` captures just the window you
   care about — cheaper to read, less visual clutter for the LLM.

If the SDK has a programmatic path (`session.tui.*`, `model.solve()`,
`ModelUtil.loadCopy()`), prefer that. `gui` is for the UI-only surface
that the SDK doesn't cover.

## Remote equivalence

`gui` is in the session namespace on the `sim serve` side. You talk to
it via the existing `/exec` HTTP channel:

```bash
# local Windows box
sim exec "dlg = gui.find('Login'); dlg.click('OK')"

# Windows box on the LAN / Tailscale
sim --host 10.0.x.y exec "dlg = gui.find('Login'); dlg.click('OK')"
```

No new endpoint, no new protocol — the same API shape from anywhere
the agent runs.

Requirement on the server host: `sim serve` must run in a **real
interactive desktop session** (normal login or RDP). Windows
service / SSH session 0 has no desktop, so pywinauto can't enumerate
any windows even though the solver processes are running. This is the
same constraint GUI-capable drivers document.

## API

### Discover what the controller is looking at

```python
gui.available          # True iff pywinauto can run — check before driving anything
gui.process_filter     # tuple of process-name substrings this gui will target
gui.list_windows()     # {ok, windows: [{hwnd, pid, proc, title, rect}, ...]}
```

### Find a window (polled until timeout)

```python
dlg = gui.find(title_contains="Login", timeout_s=5)
# returns a SimWindow, or None on timeout
```

`title_contains` is a plain substring match (case-sensitive, any language).
Returns `None` if nothing matched — **always check** before calling
methods on it:

```python
dlg = gui.find("连接到")
if dlg is None:
    _result = {"ok": False, "error": "login dialog not visible"}
else:
    dlg.click("确定")
```

### Act on a window

Every action returns `{ok: bool, ...}`. No exceptions unless you pass
invalid Python types — surface `ok=False` + `error` to the agent.

```python
dlg.click("OK", timeout_s=5)             # click a button by accessible name
dlg.send_text("alice", into="Username")  # type into a named Edit field
dlg.send_text("/tmp/out.cas.h5")         # without `into` → first editable
dlg.close()                              # WM_CLOSE (Alt+F4 equivalent)
dlg.activate()                           # bring to foreground
dlg.screenshot(label="after_login")      # window-only PNG under workdir
```

Each action method tries the most natural pywinauto strategy first
(`button_by_title`) and falls back to a broader match
(`any_control_by_title`) before giving up — the response tells you
which path worked via the `strategy` field.

### Full UIA dump

Expensive but sometimes necessary for reasoning about an unfamiliar GUI:

```python
state = gui.snapshot(max_depth=3)
# {ok, windows: [{hwnd, pid, proc, title,
#                  controls: [{name, control_type, handle, children?}, ...]}]}
```

Use this when `find(title)` misses and you need to see what the GUI
actually exposes.

### Handle metadata

`SimWindow` fields you can read without another round-trip:

```python
dlg.hwnd     # int
dlg.pid      # int
dlg.proc     # str, process name
dlg.title    # current window title
dlg.as_dict() # {hwnd, pid, proc, title, rect}
```

## Typical patterns

### Pattern 1 — dismiss a blocking login dialog

```python
dlg = gui.find(title_contains="Login", timeout_s=5)
if dlg:
    dlg.send_text("alice", into="Username")
    dlg.send_text("secret", into="Password")
    dlg.click("OK")
_result = {"dismissed": dlg is not None}
```

### Pattern 2 — confirm a "file exists, overwrite?" dialog

```python
dlg = gui.find(title_contains="Question", timeout_s=3)
if dlg is None:
    dlg = gui.find(title_contains="overwrite", timeout_s=3)  # other
if dlg:
    dlg.click("OK")
_result = {"confirmed": dlg is not None, "title": dlg.title if dlg else None}
```

### Pattern 3 — walk the solver UI tree to find an unexpected control

```python
state = gui.snapshot(max_depth=4)
names = []
def walk(items):
    for c in items:
        if c.get("name"):
            names.append((c["control_type"], c["name"]))
        walk(c.get("children") or [])
for w in state["windows"]:
    walk(w.get("controls") or [])
_result = {"control_names": names[:50]}
```

### Pattern 4 — capture only the solver window for the agent to read

```python
dlg = gui.find(title_contains="Main", timeout_s=3)
if dlg:
    shot = dlg.screenshot(label="after_solve")
    _result = shot  # contains {ok, path, width, height}
else:
    _result = {"ok": False, "error": "main window not found"}
```

## Error handling

Every call returns a dict; failures look like
`{"ok": False, "error": "connect(handle=...) failed: ..."}`. The UIA
machinery runs in an isolated subprocess so a COM glitch in one call
never poisons the next.

Things that commonly make `ok` false:

| Symptom | Likely cause | What to do |
|---|---|---|
| `find` r