Skill83 repo starsupdated 1mo ago

fpv-immersive-video-prompting

This skill converts static images, maps, or drawn paths into detailed first-person FPV video prompts optimized for AI video generators like Kling, Runway, and Veo. Use it when creating immersive one-shot scenes with specific camera movement routes, numbered stop markers, character interactions, non-human perspectives (drones, pets, vehicles), timed dialogue, and spatial constraints. The skill generates structured scene specifications and multi-image asset packs in Chinese by default, ensuring consistent camera identity, route planning, and visual coherence throughout the video sequence.

View source Repository: fpv-immersive-video-prompting

Install in Claude Code

Copy

git clone --depth 1 https://github.com/zhouwei713/fpv-immersive-video-prompting /tmp/fpv-immersive-video-prompting && cp -r /tmp/fpv-immersive-video-prompting/skill ~/.claude/skills/fpv-immersive-video-prompting

Then start a new Claude Code session; the skill loads automatically.

Definition

SKILL.md

# FPV Immersive Video Prompting

## Overview

This skill turns a scene idea plus optional image references into a directed FPV AI-video prompt. The target output should feel like the viewer is moving through a playable scene, not watching a generic beauty shot.

Use this for image-to-video workflows where the user wants:
- one-shot first-person walkthroughs
- numbered route stops or camera planning
- characters reacting as the camera approaches
- human or non-human POVs: guest, pet cat, drone, robot vacuum, bird, vehicle, object, spirit
- variable numbers of characters or targets
- strong identity and route consistency

## Core Principle

Default to Chinese for all generated prompts, including GPT Image asset prompts, route-control image prompts, video prompts, negative prompts, and delivery notes, unless the user explicitly asks for English or a target tool requires English. Keep technical terms such as FPV, Seedance, Kling, Runway, Veo, one-shot, prompt, and path control in English when they are clearer.

Write the output as a small playable scene specification:

1. Who or what is the camera?
2. Where does it start?
3. How many main targets are there exactly?
4. What numbered order does it visit them in?
5. How does the camera physically move?
6. What happens at each stop?
7. What must stay consistent?
8. What must never appear?

## Default Asset Workflow

When the user needs GPT Image / GPT-Image-2 assets, request a multi-image asset pack rather than one crowded contact sheet. Treat requests for “生图 prompt”, “首帧图”, “参考图”, “素材包”, “用 GPT Image 生成”, or any image-prep step as an asset-pack request, not as a single first-frame prompt.

For any case that needs multiple images, output one batch-generation prompt that asks GPT Image / ChatGPT to generate multiple separate images in a single response. Do not split the asset pack into many independent per-image prompts unless the user explicitly asks for that. The prompt must say: “请一次性生成 [X] 张独立图片，不要生成拼图、九宫格、contact sheet 或一张图里塞多个画面。”

For close-interaction scenes with N main people/targets, always output the complete asset pack unless the user explicitly asks for only one image:
1. First-frame scene image with small numbered stop markers only: 1, 2, 3, ... N
2. N separate character/reference images, one for each main person/target
3. Optional clean first frame without numbers for safer image-to-video input

Before returning any image prompt, run this count check: total images = 1 route/numbered first frame + N character references + optional 1 clean first frame. The video prompt must refer to these images by role, e.g. “图片 1 is route planning, 图片 2-4 are character references, 图片 5 is clean first frame.”

Default image sets:

Close-interaction / character mode:
1. First-frame scene image with small numbered stop markers only: 1, 2, 3, ...
2. One character/reference image for each main target
3. Optional clean first frame without numbers for safer image-to-video input

World-route / path-control mode:
1. Aerial route-control map or wide first frame with one clear continuous red path from start to destination
2. Optional clean world/scene reference without the red path
3. Optional landmark or character references only if specific destinations need consistent appearance

Prefer numbered stop markers for close-range character interactions, indoor scenes, crowded social scenes, and exact target-count workflows. GPT Image often struggles to draw one continuous physically coherent route through furniture, people, walls, railings, or water; a bad red line can mislead the video model. Numbered stops are easier to generate correctly, and the video prompt can define the movement between them.

Use red-line path control for large-scale route-shaped scenes: aerial world maps, fantasy continent journeys, city-to-landmark flythroughs, racing lines, canyon/drone routes, open-world game trailers, and Seedance 2.0 path-control demos. In that mode, the route image is a planning artifact; the final video must remove all red lines, arrows, annotations, labels, and map-view appearance.

### Red-Line Path Control Rules

Use this mode when the user asks for Seedance 2.0 path control, a drawn route, an aerial map flythrough, a game-world traversal, a racing line, or a large-scale drone/bird/spirit FPV journey.

For the route-control image prompt:

- Use a high-resolution 16:9 aerial terrain map, world map, tactical map, or wide route-planning scene.
- Draw one clear continuous red route line, optionally with a subtle arrow, from start to destination.
- Make the route physically plausible through visible corridors: roads, valleys, rivers, city gates, bridges, canyons, rooftops, airspace, tunnels, coastlines, or ridgelines.
- Build 4-6 visually distinct route segments so the video has progression: peaceful start, transition zone, landmark/city, danger zone, final destination.
- Keep the red line clean and legible. Avoid multiple competing paths unless the user explicitly wants branching routes.

For the video prompt:

- Say the uploaded image is a route-planning map/image, not the final look.
- Say the red line/arrow/annotations are only camera-path controls and must be completely removed.
- State the camera must strictly follow the drawn route geometry from start to destination.
- Use invisible first-person drone, bird, spirit, vehicle, or mounted-camera POV unless the route is truly walkable.
- Add timeline segments by geography, not by character stop.
- Include camera language: continuous forward motion, natural banking, close passes, altitude changes, foreground parallax, progressive acceleration, smooth horizon control.
- Avoid map view, visible red lines, annotations, teleporting, reverse motion, jump cuts, visible drone, guide characters, flat terrain, deformed landmarks, flickering structures.

Use numbered stops instead when the video depends on close character interactions, exact character count, indoor navigation, or object-level continuity.

### Numbered St