Skill6.4k repo starsupdated today

nano-banana-pro

nano-banana-pro generates a single PNG image from a text prompt using OpenRouter's Gemini 3.1 Flash model, with optional support for image-to-image editing via an input image. Use this skill when a user requests AI-generated illustrations, concept art, product renders, or modifications to existing images, specifying output filename, aspect ratio, and resolution as needed.

View source Repository: opensquilla

Install in Claude Code

Copy

git clone --depth 1 https://github.com/opensquilla/opensquilla /tmp/nano-banana-pro && cp -r /tmp/nano-banana-pro/src/opensquilla/skills/bundled/nano-banana-pro ~/.claude/skills/nano-banana-pro

Then start a new Claude Code session; the skill loads automatically.

Definition

SKILL.md

# nano-banana-pro — single-image generator via OpenRouter

Generates one PNG from a text prompt (optionally seeded with an input
image for editing). Used by `meta-short-drama` for per-shot first-frame
generation, but standalone for any single-image request.

## Inputs (`with:`)

| key | required | default | notes |
|---|---|---|---|
| `prompt` | yes | — | Plain English prompt. Append `--ar 9:16` etc. as text. |
| `filename` | yes | — | Output path. Relative resolves against process cwd. |
| `aspect_ratio` | no | `1:1` | One of `1:1`, `3:2`, `2:3`, `4:3`, `3:4`, `16:9`, `9:16`. |
| `image_size` | no | `1K` | `1K`, `2K`, `4K`. Higher = slower + costlier. |
| `model` | no | `google/gemini-3.1-flash-image-preview` | Any OpenRouter image-capable model. |
| `max_retries` | no | `0` | Extra retries on the primary `model` before moving on to `fallback_model`. |
| `fallback_model` | no | `""` | Tried ONCE after the primary exhausts retries. Empty disables it. Common pick: `google/gemini-3-pro-image-preview`. |
| `placeholder_on_fail` | no | `no` | `yes` / `no`. When every model refuses, write a 720x1280 solid-colour PNG with a "Scene placeholder" label so a downstream merge step still has a file in this slot. |

To pass an input image for edit mode, invoke the script directly with
`--input-image PATH`. The meta-skill engine does not route input images
through `with:` by convention; for edit workflows call the script.

## Auth

API-key resolution order (first hit wins):
1. `--api-key` CLI argument (rarely used; meta-skills don't pass it)
2. `OPENROUTER_API_KEY` environment variable (gateway injects from `.env`)
3. `OPENSQUILLA_LLM_API_KEY` environment variable, only when the
   effective OpenSquilla LLM provider resolves to `openrouter`.
4. `llm.api_key` or `llm.api_key_env` from the selected OpenSquilla TOML
   config file. Config discovery matches `GatewayConfig.load`: explicit
   `OPENSQUILLA_GATEWAY_CONFIG_PATH` first; otherwise
   `./opensquilla.toml`, then `default_opensquilla_home()/config.toml`.
   `OPENSQUILLA_STATE_DIR` changes `default_opensquilla_home()`, so a
   state-dir profile does not fall through to `~/.opensquilla`.
   Config-file credentials are consumed only when the selected config's
   `llm.provider` is `openrouter` or omitted.

No Google Gemini key needed — OpenRouter routes the request to the
Gemini image model on the user's behalf.

## Output

Prints the absolute path of the saved PNG on stdout. Non-zero exit on
any error; stderr carries the diagnostic.

## Cost / latency

- 1K ~ 4-8s
- 2K ~ 8-15s
- 4K ~ 20-40s
- Use 1K for draft, 4K only when the prompt is locked.

## Common failures

- `no OpenRouter API key found` → set `OPENROUTER_API_KEY`, pass
  `--api-key`, or configure `[llm] provider = "openrouter"` with
  `api_key` / `api_key_env` in the selected OpenSquilla config.
- `OpenRouter returned no image` → the model rejected the prompt
  (content moderation or unsupported request). Rewrite prompt; check
  IP-safety rules in `ai-video-script`.
- `OpenRouter HTTP 402 / 429` → out of credits / rate-limited.

More from this repository

advanced-dubbing-studioSkill

Submit audio or video for multilingual dubbing, poll status, and download dubbed audio. Use when the user asks for dubbing, 多语言配音, 视频翻译配音, 译制片, or wants a source clip dubbed into another language.

ai-video-scriptSkill

Generate a structured short-video shooting script from a topic. Emits a strict, machine-parseable shot list (3 shots by default) with image prompt + video prompt + voiceover + on-screen text per shot. Trigger when the user asks for a video script, 分镜, 短视频文案, AI视频, 短剧脚本, or wants visual prompts ready for image/video generation.

cronSkill

Use when the user asks to schedule recurring tasks, one-off reminders, timers, or cron-style jobs through the OpenSquilla cron tool.

deep-researchSkill

Multi-round research with explicit methodology, evidence tracking, and citation-tagged synthesis. Trigger on 'deep dive', 'research report', 'literature review', 'investigate X across sources', 'multi-round investigation'. Distinct from the `summarize` skill, which is a single-pass condensation; this skill maintains a state file across iterations, tracks coverage, and produces a long-form report with per-claim citations. Three execution stages: plan (scope into sub-questions), iterate (record evidence per round), compile (synthesize report). The skill itself does not fetch the web — it tells the host agent which fetches to perform via OpenSquilla's existing web tools, and records what comes back.

docxSkill

Read, edit, or create Microsoft Word `.docx` files. Trigger this skill whenever the user mentions a Word document, .docx file, contract, report, brief, memo, or asks to extract text, modify an existing doc, generate one from a brief, or audit tracked changes. Three execution paths: text-and-structure extraction, in-place edit-by-run (preserves styles), and create-from-scratch with python-docx. Falls back to OOXML unzip-and-patch for layout work python-docx cannot reach.

git-diffSkill

Capture the current git diff (staged, working-tree, or staged file list) as text. Direct shell call for workflows that need repository diffs without an LLM agent loop.

githubSkill

GitHub operations via `gh` CLI: issues, PRs, CI runs, code review, API queries. Use when: (1) checking PR status or CI, (2) creating/commenting on issues, (3) listing/filtering PRs or issues, (4) viewing run logs. NOT for: complex web UI interactions requiring manual browser flows (use browser tooling when available), bulk operations across many repos (script with gh api), or when gh auth is not configured.

history-explorerSkill

Query the per-turn DecisionEntry log for skill co-occurrence patterns, meta-skill usage stats, and the router fixture corpus. Returns a JSON summary suitable for downstream LLM consumption. Used by meta-skill-creator's harvest step but also useful standalone for 'which skills did I use most this week?'