Skip to main content
ClaudeWave
Skill1.1k estrellas del repoactualizado today

multimedia-backend-integrator

The Multimedia Backend Integrator provides a structured reference guide for extending MassGen's unified media generation system with new backends for images, video, or audio. Use this when integrating a new third-party API service, following the registration, implementation, dispatcher update, and testing checklist to ensure consistent parameter mapping, error handling, and auto-selection logic across the platform.

Instalar en Claude Code
Copiar
git clone --depth 1 https://github.com/massgen/MassGen /tmp/multimedia-backend-integrator && cp -r /tmp/multimedia-backend-integrator/massgen/skills/multimedia-backend-integrator ~/.claude/skills/multimedia-backend-integrator
Después abre una sesión nueva de Claude Code; el skill carga automáticamente.

SKILL.md

# Multimedia Backend Integrator

Reference guide for adding new media generation backends to MassGen's unified `generate_media` tool.

## Architecture Overview

```
_base.py          -- Registration: API keys, default models, priority lists
_selector.py      -- Auto-selection logic: picks best backend by key + priority
_image.py         -- Image backends: OpenAI, Google (Gemini/Imagen), Grok, OpenRouter
_video.py         -- Video backends: Grok, Google Veo, OpenAI Sora
_audio.py         -- Audio backends: ElevenLabs, OpenAI TTS
generate_media.py -- Entry point: routing, validation, batch mode, image-to-image
```

## Complete Checklist: Adding a New Backend

### 1. Registration (`_base.py`)

- [ ] Add to `BACKEND_API_KEYS`: map backend name to env var(s)
- [ ] Add to `DEFAULT_MODELS`: map backend name to `{MediaType: model_name}` for each supported type
- [ ] Add to `BACKEND_PRIORITY`: insert at correct position per media type

### 2. Implementation (`_image.py` / `_video.py` / `_audio.py`)

- [ ] Add `import` for SDK at module top
- [ ] Implement `_generate_{media}_{backend}(config) -> GenerationResult`
- [ ] Check API key first, return error result if missing
- [ ] Create SDK client with API key
- [ ] Map `config.*` fields to SDK parameters
- [ ] Handle continuation (if applicable) — see Continuation Store Patterns
- [ ] Write output bytes to `config.output_path`
- [ ] Return `GenerationResult` with metadata
- [ ] Wrap in try/except, log errors

### 3. Dispatcher Update

- [ ] Add `elif backend == "new_backend":` in the media type's `generate_{media}()` function

### 4. Image-to-Image Support (`generate_media.py`)

- [ ] Add backend name to the `selected_backend not in (...)` check in `_generate_single_with_input_images`
- [ ] Add fallback: `elif has_api_key("new_backend"):` in the auto-selection chain
- [ ] Update error message to mention new backend + env var

### 5. Documentation

- [ ] `TOOL.md`: Add env var to frontmatter, backend to tables, keywords
- [ ] `generate_media.py` docstring: Update `backend_type` list and `Supported Backends`

### 6. Tests

- [ ] Backend registration tests (API keys, default models, priority order)
- [ ] Auto-selection tests (with only this backend's key, with multiple keys)
- [ ] SDK call verification (correct params passed through)
- [ ] Output file written correctly
- [ ] Continuation flow (if applicable)
- [ ] Error handling (missing key, API errors)
- [ ] Parameter mapping (aspect_ratio, size, duration)
- [ ] Update existing tests that assert priority list length/contents

## Continuation Store Patterns

Each backend that supports iterative editing needs a continuation mechanism:

| Backend | Store Type | Key Format | What's Stored | How Continuation Works |
|---------|-----------|------------|---------------|----------------------|
| **OpenAI** | Stateless (server-side) | `response.id` | Nothing locally | Pass `previous_response_id` to next call |
| **Gemini** | `_GeminiChatStore` (in-memory) | `gemini_chat_{uuid12}` | (client, chat) tuples | Reuse chat object for `send_message()`; client kept alive to prevent HTTP connection GC |
| **Grok** | `_GrokImageStore` (in-memory) | `grok_img_{uuid12}` | Base64 strings | Pass stored base64 as `image_url` data URI |

### Store Pattern Template

```python
class _NewBackendStore:
    def __init__(self, max_items: int = 50):
        self._store: OrderedDict[str, Any] = OrderedDict()
        self._max = max_items

    def save(self, data: Any) -> str:
        store_id = f"prefix_{uuid.uuid4().hex[:12]}"
        if len(self._store) >= self._max:
            self._store.popitem(last=False)  # LRU eviction
        self._store[store_id] = data
        return store_id

    def get(self, store_id: str) -> Any | None:
        return self._store.get(store_id)

_store = _NewBackendStore()
```

## Common Pitfalls

1. **Missing from priority list** — Backend works when explicitly specified but never auto-selected
2. **Sync vs async** — Some SDKs are sync-only; wrap in `asyncio.to_thread()` if needed
3. **Ephemeral URLs** — Some APIs return temporary URLs; always prefer base64 or download immediately
4. **Falsy duration** — `duration or default` treats `0` as falsy; use `if duration is not None`
5. **Existing test breakage** — Adding to priority list changes auto-selection; update existing tests that clear env vars
6. **Image-to-image gating** — The `_generate_single_with_input_images` function has a backend allowlist

## Reference Files

| File | Purpose |
|------|---------|
| `massgen/tool/_multimodal_tools/generation/_base.py` | API keys, default models, priorities |
| `massgen/tool/_multimodal_tools/generation/_selector.py` | Backend auto-selection logic |
| `massgen/tool/_multimodal_tools/generation/_image.py` | Image generation backends |
| `massgen/tool/_multimodal_tools/generation/_video.py` | Video generation backends |
| `massgen/tool/_multimodal_tools/generation/_audio.py` | Audio generation backends |
| `massgen/tool/_multimodal_tools/generation/generate_media.py` | Entry point and routing |
| `massgen/tool/_multimodal_tools/TOOL.md` | User-facing documentation |
| `massgen/tests/test_grok_multimedia_generation.py` | Reference: Grok backend tests |
| `massgen/tests/test_grok_multimedia_backend_selection.py` | Reference: Grok selection tests |
| `massgen/tests/test_multimodal_image_backend_selection.py` | Reference: image selection tests |
audio-generationSkill

Guide to audio generation and understanding in MassGen. Covers text-to-speech, music, sound effects, and audio understanding across ElevenLabs and OpenAI backends.

backend-integratorSkill

Complete guide for integrating a new LLM backend into MassGen. Use when adding a new provider (e.g., Codex, Mistral, DeepSeek) or when auditing an existing backend for missing integration points. Covers all ~15 files that need touching.

evolving-skill-creatorSkill

Guide for creating evolving skills - detailed workflow plans that capture what you'll do, what tools you'll create, and learnings from execution. Use this when starting a new task that could benefit from a reusable workflow.

file-searchSkill

This skill should be used when agents need to search codebases for text patterns or structural code patterns. Provides fast search using ripgrep for text and ast-grep for syntax-aware code search.

image-generationSkill

Guide to image generation and editing in MassGen. Use when creating images, editing existing images, iterating on image designs, or choosing between image backends (OpenAI, Google Gemini/Imagen, Grok, OpenRouter).

massgen-config-creatorSkill

Guide for creating properly structured YAML configuration files for MassGen. This skill should be used when agents need to create new configs for examples, case studies, testing, or demonstrating features.

massgen-develops-massgenSkill

Guide for using MassGen to develop and improve itself. This skill should be used when agents need to run MassGen experiments programmatically (using automation mode) OR analyze terminal UI/UX quality (using visual evaluation tools). These are mutually exclusive workflows for different improvement goals.

massgen-log-analyzerSkill

Run MassGen experiments and analyze logs using automation mode, logfire tracing, and SQL queries. Use this skill for performance analysis, debugging agent behavior, evaluating coordination patterns, and improving the logging structure, or whenever an ANALYSIS_REPORT.md is needed in a log directory.