Skill943 repo starsupdated 26d ago

ai-image-generator

Unsupported image formats get **coerced to WebP** before sending to OpenAI. WebP is lossless conversion from PNG/JPG; GIF becomes the first frame; SVG gets rasterized at 256px. Supported input formats are documented below. ### 2. Batch variations One call, 10 variants (only GPT Image 2 does this): ``` style: pastel, minimalist, vintage, dark mode, neon, film noir, watercolor, sketch, 3D render, photorealistic ``` All share composition and color palette. When you can't decide on tone, this is faster than the multi-turn workflow. ### 3. Multi-reference compositing Pass multiple reference images (product pack shot + lifestyle, concept + wireframe, mood board) and the model lights/scales/composes them together. Useful for brand consistency and marketing asset suites. --- ## 5-Part Prompting Framework Every successful image prompt has five elements: 1. **Object** (primary subject) 2. **Context** (where, environment) 3. **Lighting** (how) 4. **

View source Repository: claude-skills

Install in Claude Code

Copy

git clone --depth 1 https://github.com/jezweb/claude-skills /tmp/ai-image-generator && cp -r /tmp/ai-image-generator/plugins/design-assets/skills/ai-image-generator ~/.claude/skills/ai-image-generator

Then start a new Claude Code session; the skill loads automatically.

Definition

SKILL.md

# AI Image Generator

Generate images using AI APIs (Google Gemini and OpenAI GPT). This skill teaches the prompting patterns and API mechanics for producing professional images directly from Claude Code.

> **Managed alternative**: If you don't want to manage API keys, [ImageBot](https://imagebot.au) provides a managed image generation service with album templates and brand kit support.

## Model Selection

Choose the right model for the job:

| Need | Model | Why |
|------|-------|-----|
| **Photorealistic scenes / stock photos** | Gemini 3.1 Flash Image | Best depth, complexity, environmental context |
| **Final client scenes (higher detail)** | Gemini 3 Pro Image | Higher detail, better style consistency |
| **Text on images** (posters, OG with copy, infographics) | GPT Image 2 | Text rendering actually works — including multi-script |
| **10-variation style exploration** | GPT Image 2 | Native batch — one prompt, 10 variants sharing composition + palette |
| **Multi-reference compositing** (product + lifestyle) | GPT Image 2 | Handles lighting, scale, perspective across references |
| **Transparent icons / logos** | GPT Image 1.5 | Native RGBA alpha — **GPT Image 2 cannot do transparency** |
| **Quick drafts / iteration** | Gemini 2.5 Flash Image | Free tier (~500/day) |

**Rule of thumb**: any image with readable text → GPT Image 2 (unless you need transparency, then GPT 1.5). Otherwise → Gemini.

### Model IDs

| Model | API ID | Provider |
|-------|--------|----------|
| Gemini 3.1 Flash Image | `gemini-3.1-flash-image-preview` | Google AI |
| Gemini 3 Pro Image | `gemini-3-pro-image-preview` | Google AI |
| Gemini 2.5 Flash Image | `gemini-2.5-flash-image` | Google AI |
| GPT Image 2 (default) | `gpt-image-2` | OpenAI |
| GPT Image 2 (ChatGPT-parity output) | `chatgpt-image-latest` | OpenAI |
| GPT Image 1.5 (transparency-only) | `gpt-image-1.5` | OpenAI |

**Verify model IDs before use** — they change frequently:
```bash
curl -s "https://generativelanguage.googleapis.com/v1beta/models?key=$GEMINI_API_KEY" | python3 -c "import sys,json; [print(m['name']) for m in json.load(sys.stdin)['models'] if 'image' in m['name'].lower()]"
```

## GPT Image 2 Specifics

Released 2026-04-22. Three capabilities that change when you'd reach for it.

### 1. Text rendering actually works

Posters, OG images with headlines, infographics with labels, UI mockups, pricing cards. Text is rendered reliably, including non-Latin scripts (Japanese, Korean, Hindi, Bengali). Primary reason to switch from Gemini — Gemini doesn't render readable text at all.

### 2. Multi-variation batching

One prompt, up to 10 images in a single call. Variants share composition and palette but differ in detail. Good for style exploration before committing, A/B options for a client, rapid ideation.

### 3. Multi-reference compositing

Feed reference images alongside your prompt — product shots, lifestyle scenes, logos. The model places the product into the scene with correct lighting, scale, perspective. Enables "product in context" workflows without multi-turn editing.

### Modes

- **Instant** (default, all plans) — generates without a planning pass. Fast, good enough for most cases.
- **Thinking** (Plus/Pro/Business plans) — plans layout before drawing. Use when element counts matter ("3 icons in a row", "5 feature bullets") or text must land in specific regions. Fewer re-rolls on complex compositions.

### Aspect ratios

3:1 ultra-wide through 1:3 ultra-tall, plus 1:1, 3:2, 2:3, 16:9, 9:16. Wider range than other models — useful for website banners (ultra-wide hero) or mobile story formats (ultra-tall).

### Resolution

Up to 2K on the long edge standard. 4K in beta.

### Generation time

**Up to 2 minutes on complex prompts.** Build async UX — don't block on the response. Show progress or spin off and poll.

### Constraints

- **No transparent backgrounds.** Fall back to `gpt-image-1.5` when you need PNG transparency.
- **API Org Verification may be required** before the endpoint fires — enable in your OpenAI account settings if you hit auth errors on first call.

### Pricing (per 1024×1024 image)

| Quality | Cost |
|---------|------|
| Low | $0.006 |
| Medium | $0.053 |
| High | $0.211 |

Token pricing: $5/M text in, $10/M text out, $8/M image in, $30/M image out.

## The 5-Part Prompting Framework

Build prompts in this order for consistent results:

### 1. Image Type
Set the genre: "A photorealistic photograph", "An isometric illustration", "A flat vector icon"

### 2. Subject
Who or what, with specific details: "of a warm, approachable Australian woman in her early 30s, smiling naturally"

### 3. Environment
Setting and spatial relationships: "in a bright modern home with terracotta decor on wooden shelves behind her"

### 4. Technical Specs
Camera and lighting: "Shot at 85mm f/2.0, natural window light, head and shoulders framing"

### 5. Constraints
What to exclude: "Photorealistic, no text, no watermarks, no logos"

### Example (Good vs Bad)

```
BAD — keyword soup:
"professional woman, spa, warm lighting, high quality, 4K"

GOOD — narrative direction:
"A professional skin treatment scene in a warm clinical setting.
A practitioner wearing blue medical gloves uses a microneedling pen
on the client's forehead. The client lies on a white treatment bed,
eyes closed, relaxed. Warm golden-hour light from a window to the
left. Terracotta-toned wall visible in the background. Shot at
85mm f/2.0, shallow depth of field. No text, no watermarks."
```

## Workflow

### 1. Determine Image Need

| Purpose | Aspect Ratio | Model |
|---------|-------------|-------|
| Hero banner (no text) | 16:9 or 21:9 | Gemini |
| Hero banner with headline copy | 16:9 or 3:1 ultra-wide | GPT Image 2 |
| Service card | 4:3 or 3:4 | Gemini |
| Profile / avatar | 1:1 | Gemini |
| Icon / badge (transparent) | 1:1 | GPT Image 1.5 |
| OG / social share (no text) | 1.91:1 | Gemini |
| OG / social share with copy | 1.91:1 | GPT Image 2 |
| Poster /

More from this repository

cloudflare-apiSkill

Hit the Cloudflare REST API directly for operations that wrangler and MCP can't handle well. Bulk DNS, custom hostnames, email routing, cache purge, WAF rules, redirect rules, zone settings, Worker routes, D1 cross-database queries, R2 bulk operations, KV bulk read/write, Vectorize queries, Queues, and fleet-wide resource audits. Produces curl commands or scripts. Triggers: 'cloudflare api', 'bulk dns', 'custom hostname', 'email routing', 'cache purge', 'waf rule', 'd1 query', 'r2 bucket', 'kv bulk', 'vectorize query', 'audit resources', 'fleet operation'.

cloudflare-worker-builderSkill

Scaffold and deploy Cloudflare Workers with Hono routing, Vite plugin, and Static Assets. Describe project, scaffold structure, configure bindings, deploy. Use whenever the user wants to create a Worker project, set up Hono on Cloudflare, configure D1 / R2 / KV / Queues bindings, or troubleshoot Worker export syntax, API route conflicts, HMR issues, or deployment failures.

d1-drizzle-schemaSkill

Generate Drizzle ORM schemas for Cloudflare D1 databases with correct D1-specific patterns. Produces schema files, migration commands, type exports, and DATABASE_SCHEMA.md documentation. Handles D1 quirks: foreign keys always enforced, no native BOOLEAN/DATETIME types, 100 bound parameter limit, JSON stored as TEXT. Use when creating a new database, adding tables, or scaffolding a D1 data layer.

d1-migrationSkill

Cloudflare D1 migration workflow: generate with Drizzle, inspect SQL for gotchas, apply to local and remote, fix stuck migrations, handle partial failures. Use when running migrations, fixing migration errors, or setting up D1 schemas.

db-seedSkill

Generate database seed scripts with realistic sample data. Reads Drizzle schemas or SQL migrations, respects foreign key ordering, produces idempotent TypeScript or SQL seed files. Handles D1 batch limits, unique constraints, and domain-appropriate data. Use when populating dev/demo/test databases. Triggers: 'seed database', 'seed data', 'sample data', 'populate database', 'db seed', 'test data', 'demo data', 'generate fixtures'.

hono-api-scaffolderSkill

Scaffold Hono API routes for Cloudflare Workers. Produces route files, middleware, typed bindings, Zod validation, error handling, and API_ENDPOINTS.md documentation. Use after a project is set up with cloudflare-worker-builder or vite-flare-starter, when you need to add API routes, create endpoints, or generate API documentation.

tanstack-startSkill

Build a full-stack TanStack Start app on Cloudflare Workers from scratch — SSR, file-based routing, server functions, D1+Drizzle, better-auth, Tailwind v4+shadcn/ui. Use whenever the user mentions TanStack Start, asks to scaffold a full-stack Cloudflare app with SSR, wants an SSR dashboard, or asks for a React 19 + Cloudflare Workers app with file-based routing and server functions — even if they don't name TanStack Start specifically. No template repo — Claude generates every file fresh per project.

vite-flare-starterSkill

Scaffold a full-stack Cloudflare app from the vite-flare-starter template — React 19 + Hono + D1+Drizzle + better-auth + Tailwind v4+shadcn/ui + TanStack Query + R2 + Workers AI. Run setup.sh to clone, configure, and deploy. Use whenever the user wants a batteries-included Cloudflare full-stack app, vite-flare-starter scaffold, or a React + Cloudflare app with auth + database + Workers AI ready to go.