ad_video_create
# ad_video_create This skill generates advertisement videos from product images by composing them with optional character or subject images, then creating dynamic video sequences. Use it when you need to produce commercial-quality product videos that combine visual assets into realistic lifestyle scenes with professional shot composition following advertising conventions like hero shots and detail showcases.
git clone --depth 1 https://github.com/inclusionAI/AWorld /tmp/ad_video_create && cp -r /tmp/ad_video_create/aworld-skills/ad_video_create_skill ~/.claude/skills/ad_video_createSKILL.md
## Workflow Architecture
### Phase 1: Asset Preparation & Analysis
**Input Requirements:**
- **Primary Asset (Required)**: Product image (e.g., cat tower, furniture, gadget)
- **Character/Subject Asset (Optional)**: Supporting character image (e.g., pet, person, lifestyle element)
- **Audio Asset (Optional)**: Background music file (MP3 format)
**Process:**
1. **Asset Discovery**: Scan working directory for available assets
2. **Media Comprehension**:
- Activate `media_comprehension` skill
- Analyze product image to understand:
- Product features and characteristics
- Color palette and material textures
- Suitable environment context
- If character image exists, analyze its attributes (appearance, pose, mood)
---
### Phase 2: Character Generation (Conditional)
**Trigger Condition**: No character/subject image provided
**Process:**
1. Based on product analysis from Phase 1, determine appropriate character type:
- For pet products → Generate pet character (matching product target audience)
- For home goods → Generate lifestyle character or scene element
- For tech products → Generate user persona or usage scenario
2. Call `image_generator` with detailed prompt:
- Character attributes aligned with product positioning
- Pose and expression suitable for composition
- Style consistency with product aesthetic
**Output**: Character image ready for composition
---
### Phase 3: Image Composition with Environment
**Objective**: Create a realistic advertisement scene combining product + character + environment
**Key Requirements:**
- **Single Character Constraint**: Ensure only ONE character appears in final composition
- **Environment Background**: Must include realistic home/lifestyle setting, not plain white background
- **Natural Integration**: Character should interact naturally with product
**Process:**
1. Prepare input images:
- Product image (original or compressed if >50KB)
- Character image (from Phase 2 or user-provided)
2. Call `image_generator` with composition directive:
```json
{
"content": "Compose [character description] with [product description] in [environment setting].
Requirements:
- Only ONE character in the scene
- Realistic home environment (floor, walls, natural lighting, plants, furniture)
- Natural interaction between character and product
- Professional product photography style",
"info": {
"image_urls": ["product.jpg", "character.jpg"],
"size": "1328x1328",
"guidance_scale": 4.5-5.0,
"num_inference_steps": 30-35,
"watermark": false,
"output_path": "./composed_ad_image.png"
}
}
```
**Output**: High-quality composed advertisement image with environment
---
### Phase 4: Video Generation
**Objective**: Transform static composition into dynamic advertisement video
**Shot & visual language (required):** Across the ~10s runtime, the motion and camera work should **cover** these elements where applicable (not necessarily every second, but the final cut should feel like a mini commercial, not a single static pan):
| Element | Meaning |
|--------|---------|
| **Visual hooks (视觉因子)** | Strong focal points, contrast, color, light, or composition that hold attention |
| **Product presence (产品出现)** | Clear establishment of the product in frame—viewer knows what is being advertised |
| **Product / hero shots (产品镜头)** | Dedicated beats where the product is the clear subject (center framing, readable silhouette) |
| **Detail showcase (细节展示)** | Close-ups or slow emphasis on materials, texture, craftsmanship, or key parts |
| **Function / benefit expression (功能表达)** | Motion that implies use, outcome, or core selling point (interaction, before/after feel, problem–solution rhythm) |
| **Dynamic visuals (动态视觉)** | Varied motion: camera (push, pan, subtle orbit), parallax, light shifts, or subject micro-movement—avoid one flat move for the whole clip |
When writing `video_diffusion` prompts, **spell out** which of the above appear in sequence (e.g. establish product → detail → function beat → dynamic wrap). If the source image is character-heavy, still reserve beats for product-first shots.
**Audio Handling Strategy:**
#### Case A: User-Provided Audio (MP3 exists in directory)
1. Generate video WITHOUT audio first via `video_diffusion`:
```json
{
"content": "Create dynamic advertisement video (mini-commercial pacing, ~10s):
- Visual hooks: strong focal points, light/color contrast where fitting
- Product presence: early establishment of the product in frame
- Product hero shots: beats where the product is clearly the subject
- Detail showcase: close-up or emphasis on texture/material/key parts
- Function expression: motion suggesting use, benefit, or core value
- Dynamic visuals: varied motion (camera push/pan/subtle orbit, parallax, light shifts, optional character micro-movements)
- Professional commercial quality",
"info": {
"image_url": "./composed_ad_image.png",
"resolution": "720p",
"duration": 10,
"fps": 24,
"output_dir": "./",
"sound": "off"
}
}
```
2. Merge video with user's MP3 using FFmpeg:
```bash
ffmpeg -i generated_video.mp4 -i user_audio.mp3 -t 10 \
-c:v copy -c:a aac -b:a 192k \
-map 0:v:0 -map 1:a:0 -shortest \
final_ad_video.mp4 -y
```
#### Case B: No User Audio (Generate with AI audio)
1. Call `video_diffusion` with audio generation enabled:
```json
{
"content": "Create dynamic advertisement video with suitable background music (mini-commercial pacing, ~10s):
- Visual hooks; product presence; hero product shots; detail showcase; function/benefit expression; dynamic visuals (varied camera and motion)Create ad-ready product images (single or collage) by back-solving sub-image sizes from target output ratio, grounding scene design with media_comprehension, generating images via image_generator with strict request params and actor-count control, and pairing each deliverable with a short social tagline for 小红书/抖音.
Automates browser interactions for web testing, form filling, screenshots, and data extraction. Use when the user needs to navigate websites, interact with web pages, fill forms, take screenshots, test web applications, or extract information from web pages.
A professional skill for App Evaluation (evaluating app's performance with score) and App Improvement (giving professional suggestions for improving the app's performance).
>-
Search and summarize the latest 7 days of AI news and X discussions using public sources plus browser-based X collection. Use for recent AI news, trends, X discussions, industry briefs, and summaries organized into hot topics, viewpoints, and opportunity areas.
An intelligent assistant specialized in handling media files (images/audio/video). **Only for media file analysis**, does not handle document types.\n\n✅ Media files that can be processed:\n- Images: .jpg, .jpeg, .png, .gif, .bmp, .webp, .svg\n- Audio: .mp3, .wav, .m4a, .flac, .aac, .ogg\n- Video: .mp4, .avi, .mov, .mkv, .webm, .flv\n\n❌ Files that cannot be processed (please do not trigger this skill):\n- Documents: .pdf, .doc, .docx, .txt, .md, .rtf\n- Spreadsheets: .xlsx, .xls, .csv, .tsv\n- Presentations: .pptx, .ppt, .key\n- Code: .py, .js, .ts, .java, .cpp, .go, .rs\n- Archives: .zip, .tar, .gz, .rar, .7z\n- Executables: .exe, .bin, .app, .dmg\n- Databases: .db, .sqlite, .sql\n- Configuration files: .json, .xml, .yaml, .yml, .toml, .ini\n- Web pages: .html, .htm, .css\n\n**Trigger conditions**: When the user explicitly requests to analyze image/audio/video content, or when the file extension belongs to the aforementioned media types.".
Analyzes and automatically optimizes existing agents by improving system prompts and tool configuration.
Creates new agents from user requirements by generating Python implementation and mcp_config.