Skip to main content
ClaudeWave
Skill128 repo starsupdated yesterday

higgsfield-gpt-image-2

Use when the user mentions GPT Image 2.0, gpt-image-2, GPT-Image-2 prompts, or wants to generate an image with GPT Image 2.0. Covers the three-format prompt taxonomy (Format A structured JSON for UI mockups and layout-dense images; Format B dense cinematic prose for single-subject scenes; Format C auto-derive meta-prompt for theme-only concepts), per-format craft patterns, output conventions, the 6-item pre-delivery checklist, and cross-surface workflow context (companion static-ads-workflow.md for ad recreation; higgsfield-marketing-studio cross-surface-workflow.md §3 for ms_image / DTC Ads Higgsfield-native alternative).

Install in Claude Code
Copy
git clone --depth 1 https://github.com/OSideMedia/higgsfield-ai-prompt-skill /tmp/higgsfield-gpt-image-2 && cp -r /tmp/higgsfield-gpt-image-2/skills/higgsfield-gpt-image-2 ~/.claude/skills/higgsfield-gpt-image-2
Then start a new Claude Code session; the skill loads automatically.

SKILL.md

# Higgsfield GPT Image 2.0

A prompt director for GPT Image 2.0. Converts plain-text concepts into production-ready prompts that route by output type: structured JSON for layout-dense images (UI mockups, infographics, character sheets, multi-panel posters), dense cinematic prose for single-subject scenes (portraits, photographs, landscapes), or auto-derive meta-prompts for theme-only concepts where the model self-generates the composition.

Translated from Adil Aliyev's `gpt-image-2-director` source corpus per the v3.7.13 / v3.7.15 translation precedent. Two companion satellites extend this sub-skill: `static-ads-workflow.md` covers the ad-recreation workflow that uses GPT Image 2.0 as its generation engine, and `reference-sheet-workflow.md` covers the Automatic Product Reference Sheet + Automatic Prompt Creator workflow (one product image → a multi-view identity-locked reference sheet for high-consistency generation).

---

## 1. What GPT Image 2.0 is

GPT Image 2.0 is an image-generation model with a distinct capability profile that shapes how its prompts should be written. Four properties drive format choice across the three prompt taxonomies in §§ 2–5 below:

**Granular layout precision.** GPT Image 2.0 honors granular layout instructions — top-left panel shows X, mid-right shows Y, N icons in a row labeled A/B/C — in a way other models don't reliably match. This is testable: run the same multi-region brief against comparable image models and observe the difference. It's also why the Format A JSON taxonomy works as well as it does: the model reads JSON region keys as layout intent.

**Text rendering.** Multi-line paragraphs, mixed scripts (CJK + Latin), small UI labels, numeric data in tables — all sharp and legible. This is one of the model's distinctive strengths over comparable image generators. Same testability boundary: a user can verify by running prompts with mixed scripts and small UI labels against comparable models and observing the difference. The implication for prompts: embed real text in quotation marks exactly as it should render; do not paraphrase.

**Design and UI as sweet spot.** Website landing pages, social-feed mockups, magazine covers, infographics, exploded product diagrams, exam-paper layouts — anything with real information density. Lean prompts into the strengths.

**Cinematic photorealism is the weakness.** Human faces often go plasticky on realism-flagged prompts. Lean into stylized, illustrated, or editorial aesthetics rather than hyperreal skin. When realism is requested, frame it as film photography (grain, flash, 35mm) rather than as "photorealistic" — film-photography language tends to produce the look users want without triggering the plasticky-skin failure mode. Cross-reference: [vocab.md](../../vocab.md) § Visual Style Vocabulary → Film Stock Emulation for the broader film-photography language family.

---

## 2. Three prompt formats

Pick one based on the user's concept. If the concept fits multiple, pick the one best suited to the subject — don't hedge.

| Format | Use when | Output type |
|---|---|---|
| **A — Structured JSON** | Output has discrete regions, labeled parts, UI chrome, multi-panel grids, or information hierarchy | UI mockups, landing pages, infographics, exploded diagrams, character reference sheets, social-media post mockups, magazine layouts, editorial document renders, multi-panel posters, comic / manga pages, brand-identity boards, design-system boards, card grids |
| **B — Dense cinematic prose** | Output is one scene, one frame, one subject with no chrome or layout regions | portraits, cinematic scenes, concept art, illustrations, landscapes, fashion shots, character moments |
| **C — Auto-derive meta-prompt** | User gives a theme and wants the model to self-generate the whole composition | concept posters from a single topic, character relationship diagrams, encyclopedia-style infographics |

Each format has its own craft patterns in §§ 3–5 below. The routing decision is consolidated in § 6.

### Tie-break

When in genuine doubt between A and B (e.g., "a character with some labels around them") — default to A. Layout precision is GPT Image 2.0's primary differentiator and prompts should reach for it.

---

## 3. Format A — Structured JSON

Write a single JSON object describing every visible region. GPT Image 2.0 reads this as a layout spec.

### Core fields to reach for

- `type` — one-line description of what this image is ("infographic poster", "landing page mockup", "exploded view diagram", "anime character reference sheet", "social media app interface mockup")
- `style` — the visual style ("cute flat vector illustration, cozy, warm, soft shading", "clean high-tech 3D render, studio lighting, glowing accents", "GTA V cover art style, cel-shaded, thick black panel borders")
- `subject` or `character` — the main entity, with specific visual attributes
- `layout` — nested objects for regions: `header`, `centerpiece`, `sections`, `footer`, `left_side`, `right_side`, `grid_panels`, `top_header`, `bottom_bar`, etc. This is where precision matters most.
- `background` — color, texture, or scene
- Text content embedded in quoted strings. Keep real text if the user provided it — don't paraphrase. CJK and other non-Latin scripts stay in their original form.

### Key patterns that make JSON prompts work

**Count-and-label pattern.** When there are multiple similar items (buttons, icons, chat messages, panels, callouts), give a `count` and a parallel `labels` array:

```json
"messages": {
  "count": 7,
  "items": ["user1: hello", "user2: hi there", "..."]
}
```

**Position-scoped regions.** Explicitly name positions: `top-left`, `top-center`, `mid-right`, `bottom-center-right`. GPT Image 2.0 respects these.

**Section objects with title, position, count, labels.** For infographics with multiple zones:

```json
{
  "title": "衣装・装備詳細",
  "position": "bottom-left",
  "count": 9,
  "labels": ["胸当て", "肩当て", "腕甲", "..."]
}
```

**Templat