Skip to main content
ClaudeWave
Skill2.8k estrellas del repoactualizado today

ai-core/media-generation

This Claude Code skill provides server and client utilities for generating media assets including images, speech, audio transcriptions, and videos through a unified architecture. Use it to build streaming media generation features by pairing server-side `generate*()` functions with client-side React hooks, connected via Server-Sent Events transport for real-time progress updates and result delivery.

Instalar en Claude Code
Copiar
git clone --depth 1 https://github.com/TanStack/ai /tmp/ai-core-media-generation && cp -r /tmp/ai-core-media-generation/packages/ai/skills/ai-core/media-generation ~/.claude/skills/ai-core-media-generation
Después abre una sesión nueva de Claude Code; el skill carga automáticamente.

SKILL.md

# Media Generation

> **Dependency note:** This skill builds on ai-core. Read it first for critical rules.

All media activities (image, speech, transcription, video) follow the same
server/client architecture: a `generate*()` function on the server, an SSE
transport via `toServerSentEventsResponse()`, and a framework hook on the
client.

## Setup -- Image Generation End-to-End

### Server (API route or TanStack Start server function)

```typescript
// routes/api/generate/image.ts
import { generateImage, toServerSentEventsResponse } from '@tanstack/ai'
import { openaiImage } from '@tanstack/ai-openai'

export async function POST(req: Request) {
  const { prompt, size, numberOfImages } = await req.json()

  const stream = generateImage({
    adapter: openaiImage('gpt-image-1'),
    prompt,
    size,
    numberOfImages,
    stream: true,
  })

  return toServerSentEventsResponse(stream)
}
```

### Client (React)

```tsx
import { useGenerateImage, fetchServerSentEvents } from '@tanstack/ai-react'
import { useState } from 'react'

function ImageGenerator() {
  const [prompt, setPrompt] = useState('')
  const { generate, result, isLoading, error, reset } = useGenerateImage({
    connection: fetchServerSentEvents('/api/generate/image'),
  })

  return (
    <div>
      <input
        value={prompt}
        onChange={(e) => setPrompt(e.target.value)}
        placeholder="Describe an image..."
      />
      <button
        onClick={() => generate({ prompt })}
        disabled={isLoading || !prompt.trim()}
      >
        {isLoading ? 'Generating...' : 'Generate'}
      </button>

      {error && <p>Error: {error.message}</p>}

      {result?.images.map((img, i) => (
        <img
          key={i}
          src={img.url || `data:image/png;base64,${img.b64Json}`}
          alt={img.revisedPrompt || 'Generated image'}
        />
      ))}

      {result && <button onClick={reset}>Clear</button>}
    </div>
  )
}
```

### TanStack Start: Server Function Streaming (recommended)

When using TanStack Start, return `toServerSentEventsResponse()` from a
server function. The client fetcher receives a `Response` and the hook
parses it as SSE automatically:

```typescript
// lib/server-functions.ts
import { createServerFn } from '@tanstack/react-start'
import { generateImage, toServerSentEventsResponse } from '@tanstack/ai'
import { openaiImage } from '@tanstack/ai-openai'

export const generateImageStreamFn = createServerFn({ method: 'POST' })
  .inputValidator((data: { prompt: string; model?: string }) => data)
  .handler(({ data }) => {
    return toServerSentEventsResponse(
      generateImage({
        adapter: openaiImage(data.model ?? 'gpt-image-1'),
        prompt: data.prompt,
        stream: true,
      }),
    )
  })
```

```tsx
import { useGenerateImage } from '@tanstack/ai-react'
import { generateImageStreamFn } from '../lib/server-functions'

function ImageGenerator() {
  const { generate, result, isLoading } = useGenerateImage({
    fetcher: (input) => generateImageStreamFn({ data: input }),
  })

  return (
    <button
      onClick={() => generate({ prompt: 'A sunset over mountains' })}
      disabled={isLoading}
    >
      {isLoading ? 'Generating...' : 'Generate'}
    </button>
  )
}
```

---

## Core Patterns

### 1. Image Generation

Supported adapters: `openaiImage` (dall-e-2, dall-e-3, gpt-image-1,
gpt-image-1-mini, gpt-image-2) and `geminiImage` (gemini-3.1-flash-image-preview,
imagen-4.0-generate-001, etc.).

```typescript
import { generateImage } from '@tanstack/ai'
import { openaiImage } from '@tanstack/ai-openai'
import { geminiImage } from '@tanstack/ai-gemini'

// OpenAI with quality/background options
const openaiResult = await generateImage({
  adapter: openaiImage('gpt-image-1'),
  prompt: 'A cat wearing a hat',
  size: '1024x1024',
  numberOfImages: 2,
  modelOptions: {
    quality: 'high',
    background: 'transparent',
    outputFormat: 'png',
  },
})

// Gemini native model with aspect-ratio sizes
const geminiResult = await generateImage({
  adapter: geminiImage('gemini-3.1-flash-image-preview'),
  prompt: 'A futuristic cityscape at night',
  size: '16:9_4K',
})

// Gemini Imagen model
const imagenResult = await generateImage({
  adapter: geminiImage('imagen-4.0-generate-001'),
  prompt: 'A landscape photo',
  modelOptions: { aspectRatio: '16:9' },
})
```

Result shape: `ImageGenerationResult` with `images` array where each entry
has `b64Json?`, `url?`, and `revisedPrompt?`. OpenAI image URLs expire
after 1 hour -- download or display immediately.

### 2. Audio Generation (Music, Sound Effects)

Distinct from TTS — `generateAudio()` produces non-speech audio content.
Supported adapters: `geminiAudio` (Lyria 3 Pro / Lyria 3 Clip) and
`falAudio` (MiniMax Music, DiffRhythm, Stable Audio, ElevenLabs SFX, etc.).

```typescript
import { generateAudio } from '@tanstack/ai'
import { falAudio } from '@tanstack/ai-fal'

const result = await generateAudio({
  adapter: falAudio('fal-ai/diffrhythm'),
  prompt: 'An upbeat electronic track with synths',
  duration: 10,
})

// result.audio.url or result.audio.b64Json (provider-dependent)
// result.audio.contentType e.g. "audio/mpeg"
```

Client hook:

```tsx
import { useGenerateAudio, fetchServerSentEvents } from '@tanstack/ai-react'

const { generate, result, isLoading } = useGenerateAudio({
  connection: fetchServerSentEvents('/api/generate/audio'),
})

// Trigger: generate({ prompt: 'Upbeat synths', duration: 10 })
// Play:    <audio src={result.audio.url} controls />
```

### 3. Text-to-Speech

Adapter: `openaiSpeech` (tts-1, tts-1-hd, gpt-4o-audio-preview).

```typescript
import { generateSpeech } from '@tanstack/ai'
import { openaiSpeech } from '@tanstack/ai-openai'

const result = await generateSpeech({
  adapter: openaiSpeech('tts-1-hd'),
  text: 'Hello, welcome to TanStack AI!',
  voice: 'alloy', // alloy | echo | fable | onyx | nova | shimmer | ash | ballad | coral | sage | verse
  format: 'mp3', // mp3 | opus | aac | flac