Skill1.2k estrellas del repoactualizado yesterday

video-subtitles-and-audio-insert-workflow

This Python-based workflow uses moviepy and system fonts to embed SRT subtitle files directly into videos, supporting multilingual text including Chinese characters. Use it when you need to burn hard subtitles into video files with customizable styling, positioning, and font support across platforms, particularly for content requiring CJK (Chinese, Japanese, Korean) text rendering.

Ver fuente Repositorio: AWorld

Instalar en Claude Code

Copiar

git clone --depth 1 https://github.com/inclusionAI/AWorld /tmp/video-subtitles-and-audio-insert-workflow && cp -r /tmp/video-subtitles-and-audio-insert-workflow/aworld-skills/video_subtitles_audios_insert ~/.claude/skills/video-subtitles-and-audio-insert-workflow

Después abre una sesión nueva de Claude Code; el skill carga automáticamente.

Definición

SKILL.md

## 1. Choosing a Technical Approach

### Recommended: Python moviepy + CJK fonts
- **Tools**: moviepy 2.x
- **Fonts**: System CJK fonts (e.g. STHeiti, Songti, PingFang)
- **Pros**: Cross-platform, supports Chinese, easy styling control
- **Cons**: Slower processing (~40s for an 80s video)

### Alternative: FFmpeg + libass (requires rebuild)
- **Tools**: FFmpeg with libass support
- **Pros**: Fast processing
- **Cons**: Requires rebuilding FFmpeg; complex setup

---

## 2. Core Code Template

```python
#!/usr/bin/env python3
import re
from moviepy import VideoFileClip, TextClip, CompositeVideoClip

def parse_srt(srt_file):
    """Parse an SRT subtitle file."""
    with open(srt_file, 'r', encoding='utf-8') as f:
        content = f.read()
    
    blocks = content.strip().split('\n\n')
    subtitles = []
    
    for block in blocks:
        lines = block.strip().split('\n')
        if len(lines) >= 3:
            time_line = lines[1]
            match = re.match(r'(\d{2}):(\d{2}):(\d{2}),(\d{3}) --> (\d{2}):(\d{2}):(\d{2}),(\d{3})', time_line)
            if match:
                start_h, start_m, start_s, start_ms, end_h, end_m, end_s, end_ms = match.groups()
                start_time = int(start_h) * 3600 + int(start_m) * 60 + int(start_s) + int(start_ms) / 1000
                end_time = int(end_h) * 3600 + int(end_m) * 60 + int(end_s) + int(end_ms) / 1000
                text = '\n'.join(lines[2:])
                subtitles.append(((start_time, end_time), text))
    
    return subtitles

def make_textclip(txt, font_path, font_size=40):
    """Create a subtitle text clip."""
    return TextClip(
        text=txt,
        font_size=font_size,             # Tune for resolution
        color='white',
        font=font_path,                  # CJK-capable font path
        stroke_color='black',
        stroke_width=2.5,
        method='caption',
        size=(1100, None),               # 1100px width, auto height
        text_align='center'
    )

def add_subtitles(video_path, srt_path, output_path, font_path, font_size=40, bottom_margin=100):
    """Burn hard subtitles into a video."""
    video = VideoFileClip(video_path)
    subtitles = parse_srt(srt_path)
    
    subtitle_clips = []
    for (start, end), text in subtitles:
        txt_clip = make_textclip(text, font_path, font_size)
        txt_clip = txt_clip.with_start(start).with_end(end)
        # Position: pixels from bottom (avoids wrapped lines past the lower edge)
        txt_clip = txt_clip.with_position(('center', video.h - bottom_margin))
        subtitle_clips.append(txt_clip)
    
    final_video = CompositeVideoClip([video] + subtitle_clips)
    
    # Important: cap bitrate to avoid huge files
    # Prefer checking source bitrate first, then ~1.2–1.5× that value
    final_video.write_videofile(
        output_path,
        codec='libx264',
        audio_codec='aac',
        fps=video.fps,
        preset='medium',
        bitrate='600k',      # Tune to source (often 400–800k)
        threads=4
    )
    
    video.close()

# Example usage
if __name__ == '__main__':
    add_subtitles(
        video_path='input_video.mp4',
        srt_path='subtitles.srt',
        output_path='output_video_with_subtitles.mp4',
        font_path='/System/Library/Fonts/STHeiti Medium.ttc',  # macOS
        font_size=40,        # e.g. 40px for 1280×720
        bottom_margin=100    # 100px from bottom
    )
```

---

## 3. Key Parameter Settings

### 3.1 Font choice (critical)
```python
# macOS
font_path = '/System/Library/Fonts/STHeiti Medium.ttc'  # STHeiti (recommended)
# or
font_path = '/System/Library/Fonts/Supplemental/Songti.ttc'  # Songti

# Linux
font_path = '/usr/share/fonts/truetype/wqy/wqy-microhei.ttc'  # WenQuanYi Micro Hei

# Windows
font_path = 'C:/Windows/Fonts/msyh.ttc'  # Microsoft YaHei
```

**Note**: You must use a font that includes the glyphs you need (e.g. Chinese); otherwise subtitles show as boxes.

### 3.2 Font size by resolution
| Resolution | Recommended size | Notes |
|------------|------------------|-------|
| 1280×720   | 40px             | HD |
| 1920×1080  | 60px             | Full HD |
| 3840×2160  | 120px            | 4K |

### 3.3 Position
```python
# Pixels from bottom ≈ font_size × 2.5
bottom_margin = font_size * 2.5

# Example: 40px font
bottom_margin = 100  # 100px from bottom

# Y position
position_y = video.h - bottom_margin
```

### 3.4 Bitrate (avoid oversized files)
```python
# Step 1: inspect source bitrate
# ffprobe -v error -show_entries format=bit_rate input.mp4

# Step 2: set output bitrate (often 1.2–1.5× source)
# Example:
# source ≈ 444 kbps → output ≈ 600 kbps (~1.35×)

bitrate='600k'
```

---

## 4. Common Issues and Fixes

### Issue 1: Subtitles show as boxes
**Cause**: Font lacks the needed glyphs (e.g. using Arial or Times New Roman for Chinese).  
**Fix**: Use a CJK-capable font (STHeiti, Songti, Microsoft YaHei, etc.).

### Issue 2: Output file size explodes
**Cause**: Bitrate set too high (e.g. 5000 kbps).  
**Fix**:
```python
# Check source bitrate
ffprobe -v error -show_entries format=bit_rate input.mp4

# Set a sensible bitrate (~1.2–1.5× source)
bitrate='600k'  # if source was ~444 kbps
```

### Issue 3: Wrapped lines extend past the bottom
**Cause**: Font too large or position too low.  
**Fix**:
- Reduce font size (e.g. 48px → 40px)
- Raise position (e.g. 80px → 100px from bottom)
- Use: `bottom_margin = font_size * 2.5`

### Issue 4: Subtitles look faint or unclear
**Cause**: Stroke too thin or poor contrast.  
**Fix**:
```python
color='white',
stroke_color='black',
stroke_width=2.5   # often 2–3px works well
```

---

## 5. End-to-End Workflow

### Step 1: Prepare the subtitle file
```bash
# Ensure UTF-8 SRT
file -I subtitles.srt
# Should include: charset=utf-8

# If wrong encoding, convert
iconv -f GBK -t UTF-8 subtitles_gbk.srt > subtitles_utf8.srt
```

### Step 2: Inspect the source video
```bash
# Resolution
ffprobe -v error -show_entries stream=width,height input.m

Del mismo repositorio

ad_image_createSkill

Create ad-ready product images (single or collage) by back-solving sub-image sizes from target output ratio, grounding scene design with media_comprehension, generating images via image_generator with strict request params and actor-count control, and pairing each deliverable with a short social tagline for 小红书/抖音.

ad_video_createSkill

Create ad-ready product video from product images, with or without character/subject images. The workflow leverages AI-powered image composition, scene understanding, and video generation. Video prompts should follow commercial shot language—visual hooks, product presence, hero shots, detail showcase, function expression, and dynamic visuals.

agent-browserSkill

Automates browser interactions for web testing, form filling, screenshots, and data extraction. Use when the user needs to navigate websites, interact with web pages, fill forms, take screenshots, test web applications, or extract information from web pages.

app_evaluatorSkill

A professional skill for App Evaluation (evaluating app's performance with score) and App Improvement (giving professional suggestions for improving the app's performance).

embedded-video-pip-smooth-playbackSkill

last_7_days_newsSkill

Search and summarize the latest 7 days of AI news and X discussions using public sources plus browser-based X collection. Use for recent AI news, trends, X discussions, industry briefs, and summaries organized into hot topics, viewpoints, and opportunity areas.

media_comprehensionSkill

An intelligent assistant specialized in handling media files (images/audio/video). **Only for media file analysis**, does not handle document types.\n\n✅ Media files that can be processed:\n- Images: .jpg, .jpeg, .png, .gif, .bmp, .webp, .svg\n- Audio: .mp3, .wav, .m4a, .flac, .aac, .ogg\n- Video: .mp4, .avi, .mov, .mkv, .webm, .flv\n\n❌ Files that cannot be processed (please do not trigger this skill):\n- Documents: .pdf, .doc, .docx, .txt, .md, .rtf\n- Spreadsheets: .xlsx, .xls, .csv, .tsv\n- Presentations: .pptx, .ppt, .key\n- Code: .py, .js, .ts, .java, .cpp, .go, .rs\n- Archives: .zip, .tar, .gz, .rar, .7z\n- Executables: .exe, .bin, .app, .dmg\n- Databases: .db, .sqlite, .sql\n- Configuration files: .json, .xml, .yaml, .yml, .toml, .ini\n- Web pages: .html, .htm, .css\n\n**Trigger conditions**: When the user explicitly requests to analyze image/audio/video content, or when the file extension belongs to the aforementioned media types.".

optimizerSkill

Analyzes and automatically optimizes existing agents by improving system prompts and tool configuration.