muapi-nano-banana
This Claude Code skill transforms raw user requests into structured creative briefs for image generation via muapi.ai, applying reasoning-driven prompting based on Google's Gemini 3 architecture. It's designed for AI agents needing to generate high-fidelity images through logic-based prompts that emphasize physical consistency, spatial relationships, and precise specifications rather than keyword-heavy descriptions. Use this when image generation requires detailed control over composition, lighting, text integration, and photorealistic output quality.
git clone --depth 1 https://github.com/SamurAIGPT/Generative-Media-Skills /tmp/muapi-nano-banana && cp -r /tmp/muapi-nano-banana/library/visual/nano-banana ~/.claude/skills/muapi-nano-bananaSKILL.md
# 🍌 Nano-Banana Expert Skill (Gemini 3 Style) **A specialized skill for AI Agents to leverage "Reasoning-Driven" image generation.** Based on the advanced prompting architecture of Google's Gemini 3 (Nano Banana Pro), this skill moves beyond keyword stuffing to structured, logic-based creative briefs. ## Core Competencies 1. **Reasoning-Driven Prompting**: Using natural language logic to define physics, lighting, and spatial relationships. 2. **Structured Creative Briefs**: Implementing the "Perfect Prompt" formula: `Subject + Action + Context + Composition + Lighting`. 3. **Text Rendering Precision**: Explicitly defining typography and signifiers for legible text integration. 4. **Contextual Grounding**: Using "Search Grounding" logic (simulated) to anchor generations in real-world accuracy. --- ## 🏗️ Technical Specification ### 1. The "Perfect Prompt" Formula | Component | Description | Example | | :--- | :--- | :--- | | **Subject** | Detailed entity description | "A stoic robot barista with exposed copper wiring" | | **Action** | Dynamic interaction | "Pouring a latte art leaf with mechanical precision" | | **Context** | Environment & Atmosphere | "Inside a neon-lit cyberpunk cafe at midnight" | | **Composition** | Camera & Lens choice | "Close-up, 85mm lens, f/1.8 aperture" | | **Lighting** | Mood & Direction | "Volumetric blue rim light, warm cafe glow" | | **Style** | Aesthetic anchor | "Cinematic, photorealistic, 4K production value" | ### 2. Advanced Features - **Negative Constraint Logic**: Instead of "no blurry," use "Ensure sharp focus on the subject's eyes." - **Identity Consistency**: (Simulated) "Maintain consistent facial structure across variations." - **Text Integration**: Use double quotes for specific text: `The sign reads "OPEN 24/7"`. --- ## 🧠 Prompt Optimization Protocol (Agent Instruction) **Before calling the script, the Agent MUST rewrite the user's prompt into a logic-driven Reasoning Brief:** 1. **NO KEYWORD SOUP**: Remove "8k, masterpiece, ultra-detailed." Use full, descriptive sentences. 2. **PHYSICAL CONSISTENCY**: Describe how elements interact (e.g., "The light from the crystal shards casts caustic patterns across the obsidian floor"). 3. **TEXT PRECISION**: If the user wants text, define it precisely: `featuring a sign that says "STORE NAME" in a weathered serif font`. 4. **OPTICAL DIRECTIVES**: Specify lens behavior: *Shallow Depth of Field (f/1.8)*, *Macro Lens*, *Anamorphic Flare*. --- ## 🚀 Protocol: Using Nano-Banana ### Step 1: Define the Creative Logic Provide the agent with a subject and a specific scenario. ### Step 2: Invoke the Script The `generate-nano-art.sh` script translates the logic into a structured Gemini 3-style prompt. ```bash # Generating a reasoning-driven image bash scripts/generate-nano-art.sh \ --subject "a glass chess piece" \ --action "shattering into liquid shards" \ --context "on a obsidian table" \ --style "macro photography" ``` --- ## ⚠️ Constraints & Guardrails - **No Keyword Soup**: **MANDATORY** - Do not use "trending on artstation, masterpiece, 8k". Use natural language descriptions. - **Physics Logic**: Ensure the prompt describes *physically possible* lighting and reflection interactions. - **Full Sentences**: The model parses relationships; use "light reflecting off the water" instead of "water, reflection". --- ## ⚙️ Implementation Details This skill applies a "Logic Wrapper" around the `core/media/generate-image.sh` primitive, converting fragmented inputs into a coherent, reasoning-ready narrative prompt.
Edit and enhance images and videos with AI via muapi.ai — prompt-based editing, upscaling, background removal, face swap, lipsync, video effects, and more
Generate AI images, videos, music, and audio from the terminal via muapi.ai — supports 100+ models including Flux, Midjourney v7, Kling 3.0, Veo3, and Suno V5
Setup and utility scripts for muapi.ai — configure API keys, test connectivity, and poll for async generation results
Turn a long video into N viral-ready short clips with a single managed API call. Wraps muapi.ai's `/ai-clipping` endpoint, which handles transcription, highlight ranking through a virality framework (hook / emotional peak / opinion bomb / revelation / conflict / quotable / story peak / practical value), overlap dedupe, and vertical face-tracking auto-crop server-side. No local Whisper, no local LLM, no GPU.
Transform a 2D logo into a premium 3D version and animate it with professional cinematic effects.
Generate a high-cut-density action / fight scene by first composing a 16-cell storyboard image, then driving Seedance 2.0 image-to-video off that storyboard. Stacks GPT-Image-2 (character sheet + storyboard), Nano-Banana-2 (environment concept), and Seedance 2.0 i2v.
Create a hilarious and ultra-realistic video of an anthropomorphic animal acting like a human vlogger in a real-world setting.
Generate a 15-second cinematic awards-ceremony video — a host announces a winner from the stage, a spotlight finds them in the crowd, they walk up to the podium, receive the award, and the LED display reveals their name and "THE BEST ACTOR".