Skill1.2k repo starsupdated today

ad_image_create

The ad_image_create skill generates advertising-ready product images in single or collage formats by first determining the target output ratio, computing optimal sub-image dimensions, and calling the image_generator with precise parameters. It integrates product understanding through media_comprehension to ensure stylistic consistency, manages actor references from user-supplied assets, and concludes by pairing each final deliverable with concise social media copy optimized for Xiaohongshu and Douyin platforms.

View source Repository: AWorld

Install in Claude Code

Copy

git clone --depth 1 https://github.com/inclusionAI/AWorld /tmp/ad_image_create && cp -r /tmp/ad_image_create/aworld-skills/ad_image_create_skill ~/.claude/skills/ad_image_create

Then start a new Claude Code session; the skill loads automatically.

Definition

SKILL.md

# Ad Image Creation

## What this skill does

Generate advertising images from product assets with two output styles:
- Single hero image
- Collage image (multiple sub-images stitched into one final canvas)

Core method: decide the final target ratio first, then compute sub-image sizes, and call `image_generator` directly with matching `size` (no manual pre-crop/pre-pad on source assets).

## Required workflow

1. Understand final deliverable:
   - Final ratio and size (for example `16:9`, `1920x1080`)
   - Single image or collage layout (`2x2`, `1x3`, `1x2`)
2. Activate product understanding:
   - `SKILL__active_skill(skill_name="media_comprehension")`
   - Extract product style, tone, audience, and suitable scene category.
3. Design scenes that match product positioning:
   - Keep style consistent with product quality/tone.
   - Avoid mismatched backgrounds (for example: minimal product + ultra-baroque palace).
4. Generate each sub-image using `image_generator` with exact request params.
5. Stitch sub-images (if collage), then validate final size/ratio.
6. **Social copy:** After images are final, add **one** short line of ad copy **per** deliverable image—the same count as the exported ad files (one hero → one line; four separate exports → four lines; one stitched collage file usually → one line unless the user asked for per-panel copy). Keep each line simple, fun, and tightly tied to that image’s scene and benefit; aim for **小红书** / **抖音** scroll appeal, not generic brand platitudes.

## Supporting actor references (Mode 2/3)

When the ad needs a supporting actor beyond the product—either because the user asked for one or because they supplied material—do **not** fetch companion assets from TikTok or similar platforms. Use what is already available:

- **User supplied still image(s):** Use the provided file path(s) as `reference_images` for `image_generator` after a quick `media_comprehension` check that the image shows the intended actor/look.
- **User supplied video:** Capture one or more frames (screenshots) from that video in the workspace, run `SKILL__active_skill(skill_name="media_comprehension")` on each candidate frame, and pick a frame where the model confirms the desired supporting actor/appearance. Use that frame image as `reference_images`.

If the user provides no usable image or video reference, you may still proceed: call `image_generator` without actor `reference_images` and describe the supporting actor so the model generates that character in-scene—still following the actor-count rules below.

## `image_generator` request contract (keep these fields)

### Common fields

```python
image_generator(
    content="...",
    info={
        "image_url": "/path/to/product.jpg",
        "size": "960x540",
        "output_dir": "/path/to/output"
    }
)
```

- `content`: prompt describing scene and composition.
- `info.image_url`: primary product image path.
- `info.size`: output size string in `"WIDTHxHEIGHT"` format.
- `info.output_dir`: output directory.

### Optional field for actor/reference inputs

```python
"reference_images": ["/path/to/ref1.jpg", "/path/to/ref2.jpg"]
```

## Input modes

### Mode 1: Product only

- Input: one product image
- Output: product integrated into environment
- Use when emphasizing material, shape, and style fit.

### Mode 2: Product + one actor reference

- Input: product image + one actor image
- Output: product and actor in one scene
- Use when showing usage context and emotional connection.

### Mode 3: Product + multiple reference images

- Input: product image + multiple references
- Output: richer scene with better pose/style guidance
- Still enforce actor-count language in prompt.

## Critical rule: actor-count control

When using Mode 2/3, model may generate too many actors unless count is explicit.

### Required prompt pattern

- Use explicit count language:
  - Chinese: `只有一只/个`, `最多两只/个`
  - English: `only one`, `a single`, `at most two`
- Recommended actor count:
  - Ad focus: 1 actor (preferred)
  - Lifestyle scene: max 2 actors

### Good vs bad prompt snippet

```python
# Good
content = "Create a warm living-room scene. There is only one cat interacting with the cat tree."

# Bad
content = "Create a warm scene with cats interacting with the cat tree."
```

## Size back-solving quick table

| Final size | Layout | Sub-image size |
|---|---|---|
| 1920x1080 (16:9) | 2x2 | 960x540 |
| 1920x1080 (16:9) | 1x3 | 640x1080 |
| 1920x1080 (16:9) | 1x2 | 960x1080 |
| 1600x1200 (4:3) | 2x2 | 800x600 |
| 1600x1200 (4:3) | 1x3 | 533x1200 |
| 1080x1080 (1:1) | 2x2 | 540x540 |
| 1080x1080 (1:1) | 1x3 | 360x1080 |
| 1080x1920 (9:16) | 2x2 | 540x960 |

## Minimal implementation template

```python
from PIL import Image

# 1) Analyze product style first
SKILL__active_skill(skill_name="media_comprehension")

# 2) Decide target and layout
final_size = (1920, 1080)
layout = "2x2"
sub_size = (960, 540)

# 3) Generate sub-images
for scene in scenes:
    content = scene["prompt"]  # include explicit actor count for Mode 2/3
    info = {
        "image_url": product_image,
        "size": f"{sub_size[0]}x{sub_size[1]}",
        "output_dir": output_dir
    }
    if scene.get("reference_images"):
        info["reference_images"] = scene["reference_images"]
    image_generator(content=content, info=info)

# 4) Stitch
canvas = Image.new("RGB", final_size, (255, 255, 255))
# paste each sub-image by layout...
canvas.save("final_ad.png", quality=95)
```

## Quality checks

- One social tagline per final ad image (step 6): tone fits 小红书/抖音 skim-reading; matches that image, not a generic slogan.
- Product style and environment are consistent.
- For Mode 2/3, actor count is explicitly constrained in `content`.
- `size` values match computed sub-image dimensions.
- Final stitched output matches requested ratio/size.
- If generator output has slight dimension drift (for example height offset), crop after stitching.

## Notes

- No source pre-processing required b

More from this repository

ad_video_createSkill

Create ad-ready product video from product images, with or without character/subject images. The workflow leverages AI-powered image composition, scene understanding, and video generation. Video prompts should follow commercial shot language—visual hooks, product presence, hero shots, detail showcase, function expression, and dynamic visuals.

agent-browserSkill

Automates browser interactions for web testing, form filling, screenshots, and data extraction. Use when the user needs to navigate websites, interact with web pages, fill forms, take screenshots, test web applications, or extract information from web pages.

app_evaluatorSkill

A professional skill for App Evaluation (evaluating app's performance with score) and App Improvement (giving professional suggestions for improving the app's performance).

embedded-video-pip-smooth-playbackSkill

last_7_days_newsSkill

Search and summarize the latest 7 days of AI news and X discussions using public sources plus browser-based X collection. Use for recent AI news, trends, X discussions, industry briefs, and summaries organized into hot topics, viewpoints, and opportunity areas.

media_comprehensionSkill

An intelligent assistant specialized in handling media files (images/audio/video). **Only for media file analysis**, does not handle document types.\n\n✅ Media files that can be processed:\n- Images: .jpg, .jpeg, .png, .gif, .bmp, .webp, .svg\n- Audio: .mp3, .wav, .m4a, .flac, .aac, .ogg\n- Video: .mp4, .avi, .mov, .mkv, .webm, .flv\n\n❌ Files that cannot be processed (please do not trigger this skill):\n- Documents: .pdf, .doc, .docx, .txt, .md, .rtf\n- Spreadsheets: .xlsx, .xls, .csv, .tsv\n- Presentations: .pptx, .ppt, .key\n- Code: .py, .js, .ts, .java, .cpp, .go, .rs\n- Archives: .zip, .tar, .gz, .rar, .7z\n- Executables: .exe, .bin, .app, .dmg\n- Databases: .db, .sqlite, .sql\n- Configuration files: .json, .xml, .yaml, .yml, .toml, .ini\n- Web pages: .html, .htm, .css\n\n**Trigger conditions**: When the user explicitly requests to analyze image/audio/video content, or when the file extension belongs to the aforementioned media types.".

optimizerSkill

Analyzes and automatically optimizes existing agents by improving system prompts and tool configuration.

text2agentSkill

Creates new agents from user requirements by generating Python implementation and mcp_config.