nano-banana-pro
nano-banana-pro generates a single PNG image from a text prompt using OpenRouter's Gemini 3.1 Flash model, with optional support for image-to-image editing via an input image. Use this skill when a user requests AI-generated illustrations, concept art, product renders, or modifications to existing images, specifying output filename, aspect ratio, and resolution as needed.
git clone --depth 1 https://github.com/opensquilla/opensquilla /tmp/nano-banana-pro && cp -r /tmp/nano-banana-pro/src/opensquilla/skills/bundled/nano-banana-pro ~/.claude/skills/nano-banana-proSKILL.md
# nano-banana-pro — single-image generator via OpenRouter Generates one PNG from a text prompt (optionally seeded with an input image for editing). Used by `meta-short-drama` for per-shot first-frame generation, but standalone for any single-image request. ## Inputs (`with:`) | key | required | default | notes | |---|---|---|---| | `prompt` | yes | — | Plain English prompt. Append `--ar 9:16` etc. as text. | | `filename` | yes | — | Output path. Relative resolves against process cwd. | | `aspect_ratio` | no | `1:1` | One of `1:1`, `3:2`, `2:3`, `4:3`, `3:4`, `16:9`, `9:16`. | | `image_size` | no | `1K` | `1K`, `2K`, `4K`. Higher = slower + costlier. | | `model` | no | `google/gemini-3.1-flash-image-preview` | Any OpenRouter image-capable model. | | `max_retries` | no | `0` | Extra retries on the primary `model` before moving on to `fallback_model`. | | `fallback_model` | no | `""` | Tried ONCE after the primary exhausts retries. Empty disables it. Common pick: `google/gemini-3-pro-image-preview`. | | `placeholder_on_fail` | no | `no` | `yes` / `no`. When every model refuses, write a 720x1280 solid-colour PNG with a "Scene placeholder" label so a downstream merge step still has a file in this slot. | To pass an input image for edit mode, invoke the script directly with `--input-image PATH`. The meta-skill engine does not route input images through `with:` by convention; for edit workflows call the script. ## Auth API-key resolution order (first hit wins): 1. `--api-key` CLI argument (rarely used; meta-skills don't pass it) 2. `OPENROUTER_API_KEY` environment variable (gateway injects from `.env`) 3. `OPENSQUILLA_LLM_API_KEY` environment variable, only when the effective OpenSquilla LLM provider resolves to `openrouter`. 4. `llm.api_key` or `llm.api_key_env` from the selected OpenSquilla TOML config file. Config discovery matches `GatewayConfig.load`: explicit `OPENSQUILLA_GATEWAY_CONFIG_PATH` first; otherwise `./opensquilla.toml`, then `default_opensquilla_home()/config.toml`. `OPENSQUILLA_STATE_DIR` changes `default_opensquilla_home()`, so a state-dir profile does not fall through to `~/.opensquilla`. Config-file credentials are consumed only when the selected config's `llm.provider` is `openrouter` or omitted. No Google Gemini key needed — OpenRouter routes the request to the Gemini image model on the user's behalf. ## Output Prints the absolute path of the saved PNG on stdout. Non-zero exit on any error; stderr carries the diagnostic. ## Cost / latency - 1K ~ 4-8s - 2K ~ 8-15s - 4K ~ 20-40s - Use 1K for draft, 4K only when the prompt is locked. ## Common failures - `no OpenRouter API key found` → set `OPENROUTER_API_KEY`, pass `--api-key`, or configure `[llm] provider = "openrouter"` with `api_key` / `api_key_env` in the selected OpenSquilla config. - `OpenRouter returned no image` → the model rejected the prompt (content moderation or unsupported request). Rewrite prompt; check IP-safety rules in `ai-video-script`. - `OpenRouter HTTP 402 / 429` → out of credits / rate-limited.
Submit audio or video for multilingual dubbing, poll status, and download dubbed audio. Use when the user asks for dubbing, 多语言配音, 视频翻译配音, 译制片, or wants a source clip dubbed into another language.
Generate a structured short-video shooting script from a topic. Emits a strict, machine-parseable shot list (3 shots by default) with image prompt + video prompt + voiceover + on-screen text per shot. Trigger when the user asks for a video script, 分镜, 短视频文案, AI视频, 短剧脚本, or wants visual prompts ready for image/video generation.
Use when the user asks to schedule recurring tasks, one-off reminders, timers, or cron-style jobs through the OpenSquilla cron tool.
Multi-round research with explicit methodology, evidence tracking, and citation-tagged synthesis. Trigger on 'deep dive', 'research report', 'literature review', 'investigate X across sources', 'multi-round investigation'. Distinct from the `summarize` skill, which is a single-pass condensation; this skill maintains a state file across iterations, tracks coverage, and produces a long-form report with per-claim citations. Three execution stages: plan (scope into sub-questions), iterate (record evidence per round), compile (synthesize report). The skill itself does not fetch the web — it tells the host agent which fetches to perform via OpenSquilla's existing web tools, and records what comes back.
Read, edit, or create Microsoft Word `.docx` files. Trigger this skill whenever the user mentions a Word document, .docx file, contract, report, brief, memo, or asks to extract text, modify an existing doc, generate one from a brief, or audit tracked changes. Three execution paths: text-and-structure extraction, in-place edit-by-run (preserves styles), and create-from-scratch with python-docx. Falls back to OOXML unzip-and-patch for layout work python-docx cannot reach.
Capture the current git diff (staged, working-tree, or staged file list) as text. Direct shell call for workflows that need repository diffs without an LLM agent loop.
GitHub operations via `gh` CLI: issues, PRs, CI runs, code review, API queries. Use when: (1) checking PR status or CI, (2) creating/commenting on issues, (3) listing/filtering PRs or issues, (4) viewing run logs. NOT for: complex web UI interactions requiring manual browser flows (use browser tooling when available), bulk operations across many repos (script with gh api), or when gh auth is not configured.
Query the per-turn DecisionEntry log for skill co-occurrence patterns, meta-skill usage stats, and the router fixture corpus. Returns a JSON summary suitable for downstream LLM consumption. Used by meta-skill-creator's harvest step but also useful standalone for 'which skills did I use most this week?'