Skip to main content
ClaudeWave
Skill1.2k estrellas del repoactualizado yesterday

embedded-video-pip-smooth-playback

This Claude Code skill addresses stuttering and frame drops in embedded video clips during code-driven rendering by explaining the root cause (sparse keyframes forcing slow decoding chains) and providing an FFmpeg command to re-encode source clips with every frame as a keyframe using `-g 1`. Use this when exporting compositions with picture-in-picture video from tools like Remotion or After Effects scripts where the embedded clip plays unevenly while the main layer stays smooth.

Instalar en Claude Code
Copiar
git clone --depth 1 https://github.com/inclusionAI/AWorld /tmp/embedded-video-pip-smooth-playback && cp -r /tmp/embedded-video-pip-smooth-playback/aworld-skills/embedded_video ~/.claude/skills/embedded-video-pip-smooth-playback
Después abre una sesión nueva de Claude Code; el skill carga automáticamente.

SKILL.md

## 1. Problem scenario

When you build videos with code-driven renderers (e.g. Remotion, AE scripts, complex FFmpeg filter graphs), you often need picture-in-picture: one main composition with another video embedded inside it.

**Typical symptom**: In the exported file, motion on the main layer (translation, scale, etc.) looks smooth, but the **embedded clip stutters badly**, drops frames, or even freezes for long stretches.

## 2. Root cause: sparse keyframes

Modern codecs (H.264/H.265) save space by storing full pictures only at scene cuts or every few seconds (**keyframes / I-frames**). Frames in between (**P-frames / B-frames**) only store differences from neighbors.

Engines like Remotion export **frame by frame**. To render frame *N*, the embedded clip must **seek** to the matching timestamp.

If the embedded file has almost no keyframes (e.g. one I-frame at the start of a 10 s clip), the decoder often has to **decode from frame 0** forward to reach frame *N*. That leads to:

1. **Very slow seeks**: Decoding takes so long that the renderer times out and grabs a frame before the decode finishes.
2. **Repeated frames**: The decoder cannot keep up, so several consecutive captures show the same old image—**stutter** in the final output.

## 3. Fix: all-intra encoding (every frame a keyframe)

**Idea**: Re-encode the embedded asset so **every frame is a keyframe**. Then any seek returns a full picture immediately, with no long chains of dependent frames.

### Steps

#### Step 1: Re-encode with FFmpeg

Run:

```bash
ffmpeg -i input.mp4 -c:v libx264 -g 1 -pix_fmt yuv420p output_keyframes.mp4
```

**Parameters**:

| Flag | Meaning |
|------|---------|
| `-i input.mp4` | Source clip you embed. |
| `-c:v libx264` | H.264 for broad compatibility with web and renderers. |
| `-g 1` | **Critical**: GOP size 1 → **one keyframe per frame**. |
| `-pix_fmt yuv420p` | Common 8-bit 4:2:0 layout for players and pipelines. |
| `output_keyframes.mp4` | Output used as the fixed asset. |

*(All-intra files are often **several times larger** than the original. That is usually fine for an **intermediate** asset used only during rendering.)*

#### Step 2: Point your project at the new file

Replace paths so the composition uses `output_keyframes.mp4` instead of the old `input.mp4`.

#### Step 3: Re-render

Export again; embedded playback should track smoothly with the main timeline.

## 4. Rules of thumb

**Rule:** Any asset that must be seeked frame-accurately from code should be pre-converted to **all-intra** (`-g 1`) before use.

This applies broadly in **programmatic video**: PiP, reverse playback, scrubber-driven playback, etc. If embedded video looks choppy, **check keyframe spacing first** and re-encode if needed.
ad_image_createSkill

Create ad-ready product images (single or collage) by back-solving sub-image sizes from target output ratio, grounding scene design with media_comprehension, generating images via image_generator with strict request params and actor-count control, and pairing each deliverable with a short social tagline for 小红书/抖音.

ad_video_createSkill

Create ad-ready product video from product images, with or without character/subject images. The workflow leverages AI-powered image composition, scene understanding, and video generation. Video prompts should follow commercial shot language—visual hooks, product presence, hero shots, detail showcase, function expression, and dynamic visuals.

agent-browserSkill

Automates browser interactions for web testing, form filling, screenshots, and data extraction. Use when the user needs to navigate websites, interact with web pages, fill forms, take screenshots, test web applications, or extract information from web pages.

app_evaluatorSkill

A professional skill for App Evaluation (evaluating app's performance with score) and App Improvement (giving professional suggestions for improving the app's performance).

last_7_days_newsSkill

Search and summarize the latest 7 days of AI news and X discussions using public sources plus browser-based X collection. Use for recent AI news, trends, X discussions, industry briefs, and summaries organized into hot topics, viewpoints, and opportunity areas.

media_comprehensionSkill

An intelligent assistant specialized in handling media files (images/audio/video). **Only for media file analysis**, does not handle document types.\n\n✅ Media files that can be processed:\n- Images: .jpg, .jpeg, .png, .gif, .bmp, .webp, .svg\n- Audio: .mp3, .wav, .m4a, .flac, .aac, .ogg\n- Video: .mp4, .avi, .mov, .mkv, .webm, .flv\n\n❌ Files that cannot be processed (please do not trigger this skill):\n- Documents: .pdf, .doc, .docx, .txt, .md, .rtf\n- Spreadsheets: .xlsx, .xls, .csv, .tsv\n- Presentations: .pptx, .ppt, .key\n- Code: .py, .js, .ts, .java, .cpp, .go, .rs\n- Archives: .zip, .tar, .gz, .rar, .7z\n- Executables: .exe, .bin, .app, .dmg\n- Databases: .db, .sqlite, .sql\n- Configuration files: .json, .xml, .yaml, .yml, .toml, .ini\n- Web pages: .html, .htm, .css\n\n**Trigger conditions**: When the user explicitly requests to analyze image/audio/video content, or when the file extension belongs to the aforementioned media types.".

optimizerSkill

Analyzes and automatically optimizes existing agents by improving system prompts and tool configuration.

text2agentSkill

Creates new agents from user requirements by generating Python implementation and mcp_config.