higgsfield-gpt-image-2
Use when the user mentions GPT Image 2.0, gpt-image-2, GPT-Image-2 prompts, or wants to generate an image with GPT Image 2.0. Covers the three-format prompt taxonomy (Format A structured JSON for UI mockups and layout-dense images; Format B dense cinematic prose for single-subject scenes; Format C auto-derive meta-prompt for theme-only concepts), per-format craft patterns, output conventions, the 6-item pre-delivery checklist, and cross-surface workflow context (companion static-ads-workflow.md for ad recreation; higgsfield-marketing-studio cross-surface-workflow.md §3 for ms_image / DTC Ads Higgsfield-native alternative).
git clone --depth 1 https://github.com/OSideMedia/higgsfield-ai-prompt-skill /tmp/higgsfield-gpt-image-2 && cp -r /tmp/higgsfield-gpt-image-2/skills/higgsfield-gpt-image-2 ~/.claude/skills/higgsfield-gpt-image-2SKILL.md
# Higgsfield GPT Image 2.0
A prompt director for GPT Image 2.0. Converts plain-text concepts into production-ready prompts that route by output type: structured JSON for layout-dense images (UI mockups, infographics, character sheets, multi-panel posters), dense cinematic prose for single-subject scenes (portraits, photographs, landscapes), or auto-derive meta-prompts for theme-only concepts where the model self-generates the composition.
Translated from Adil Aliyev's `gpt-image-2-director` source corpus per the v3.7.13 / v3.7.15 translation precedent. Two companion satellites extend this sub-skill: `static-ads-workflow.md` covers the ad-recreation workflow that uses GPT Image 2.0 as its generation engine, and `reference-sheet-workflow.md` covers the Automatic Product Reference Sheet + Automatic Prompt Creator workflow (one product image → a multi-view identity-locked reference sheet for high-consistency generation).
---
## 1. What GPT Image 2.0 is
GPT Image 2.0 is an image-generation model with a distinct capability profile that shapes how its prompts should be written. Four properties drive format choice across the three prompt taxonomies in §§ 2–5 below:
**Granular layout precision.** GPT Image 2.0 honors granular layout instructions — top-left panel shows X, mid-right shows Y, N icons in a row labeled A/B/C — in a way other models don't reliably match. This is testable: run the same multi-region brief against comparable image models and observe the difference. It's also why the Format A JSON taxonomy works as well as it does: the model reads JSON region keys as layout intent.
**Text rendering.** Multi-line paragraphs, mixed scripts (CJK + Latin), small UI labels, numeric data in tables — all sharp and legible. This is one of the model's distinctive strengths over comparable image generators. Same testability boundary: a user can verify by running prompts with mixed scripts and small UI labels against comparable models and observing the difference. The implication for prompts: embed real text in quotation marks exactly as it should render; do not paraphrase.
**Design and UI as sweet spot.** Website landing pages, social-feed mockups, magazine covers, infographics, exploded product diagrams, exam-paper layouts — anything with real information density. Lean prompts into the strengths.
**Cinematic photorealism is the weakness.** Human faces often go plasticky on realism-flagged prompts. Lean into stylized, illustrated, or editorial aesthetics rather than hyperreal skin. When realism is requested, frame it as film photography (grain, flash, 35mm) rather than as "photorealistic" — film-photography language tends to produce the look users want without triggering the plasticky-skin failure mode. Cross-reference: [vocab.md](../../vocab.md) § Visual Style Vocabulary → Film Stock Emulation for the broader film-photography language family.
---
## 2. Three prompt formats
Pick one based on the user's concept. If the concept fits multiple, pick the one best suited to the subject — don't hedge.
| Format | Use when | Output type |
|---|---|---|
| **A — Structured JSON** | Output has discrete regions, labeled parts, UI chrome, multi-panel grids, or information hierarchy | UI mockups, landing pages, infographics, exploded diagrams, character reference sheets, social-media post mockups, magazine layouts, editorial document renders, multi-panel posters, comic / manga pages, brand-identity boards, design-system boards, card grids |
| **B — Dense cinematic prose** | Output is one scene, one frame, one subject with no chrome or layout regions | portraits, cinematic scenes, concept art, illustrations, landscapes, fashion shots, character moments |
| **C — Auto-derive meta-prompt** | User gives a theme and wants the model to self-generate the whole composition | concept posters from a single topic, character relationship diagrams, encyclopedia-style infographics |
Each format has its own craft patterns in §§ 3–5 below. The routing decision is consolidated in § 6.
### Tie-break
When in genuine doubt between A and B (e.g., "a character with some labels around them") — default to A. Layout precision is GPT Image 2.0's primary differentiator and prompts should reach for it.
---
## 3. Format A — Structured JSON
Write a single JSON object describing every visible region. GPT Image 2.0 reads this as a layout spec.
### Core fields to reach for
- `type` — one-line description of what this image is ("infographic poster", "landing page mockup", "exploded view diagram", "anime character reference sheet", "social media app interface mockup")
- `style` — the visual style ("cute flat vector illustration, cozy, warm, soft shading", "clean high-tech 3D render, studio lighting, glowing accents", "GTA V cover art style, cel-shaded, thick black panel borders")
- `subject` or `character` — the main entity, with specific visual attributes
- `layout` — nested objects for regions: `header`, `centerpiece`, `sections`, `footer`, `left_side`, `right_side`, `grid_panels`, `top_header`, `bottom_bar`, etc. This is where precision matters most.
- `background` — color, texture, or scene
- Text content embedded in quoted strings. Keep real text if the user provided it — don't paraphrase. CJK and other non-Latin scripts stay in their original form.
### Key patterns that make JSON prompts work
**Count-and-label pattern.** When there are multiple similar items (buttons, icons, chat messages, panels, callouts), give a `count` and a parallel `labels` array:
```json
"messages": {
"count": 7,
"items": ["user1: hello", "user2: hi there", "..."]
}
```
**Position-scoped regions.** Explicitly name positions: `top-left`, `top-center`, `mid-right`, `bottom-center-right`. GPT Image 2.0 respects these.
**Section objects with title, position, count, labels.** For infographics with multiple zones:
```json
{
"title": "衣装・装備詳細",
"position": "bottom-left",
"count": 9,
"labels": ["胸当て", "肩当て", "腕甲", "..."]
}
```
**TemplatGuided version bump — validate, tag, and create GitHub release
Run pre-release validation checks on all SKILL.md files and JSON databases
>
Seedance 2.0 video prompt director. Converts plain-text scene descriptions into production-ready bilingual EN+ZH video prompts optimized for the Seedance 2.0 video generator. Handles action scenes (combat, pursuit, stunts), general scenes (landscapes, journeys, atmosphere), and dialogue scenes (confrontations, negotiations, interrogations). Use this skill whenever the user wants to create a Seedance video prompt, describes a scene for video generation, mentions Seedance, or asks for a cinematic scene breakdown.
>
>
>
>