ai-core/media-generation
This Claude Code skill provides server and client utilities for generating media assets including images, speech, audio transcriptions, and videos through a unified architecture. Use it to build streaming media generation features by pairing server-side `generate*()` functions with client-side React hooks, connected via Server-Sent Events transport for real-time progress updates and result delivery.
git clone --depth 1 https://github.com/TanStack/ai /tmp/ai-core-media-generation && cp -r /tmp/ai-core-media-generation/packages/ai/skills/ai-core/media-generation ~/.claude/skills/ai-core-media-generationSKILL.md
# Media Generation
> **Dependency note:** This skill builds on ai-core. Read it first for critical rules.
All media activities (image, speech, transcription, video) follow the same
server/client architecture: a `generate*()` function on the server, an SSE
transport via `toServerSentEventsResponse()`, and a framework hook on the
client.
## Setup -- Image Generation End-to-End
### Server (API route or TanStack Start server function)
```typescript
// routes/api/generate/image.ts
import { generateImage, toServerSentEventsResponse } from '@tanstack/ai'
import { openaiImage } from '@tanstack/ai-openai'
export async function POST(req: Request) {
const { prompt, size, numberOfImages } = await req.json()
const stream = generateImage({
adapter: openaiImage('gpt-image-1'),
prompt,
size,
numberOfImages,
stream: true,
})
return toServerSentEventsResponse(stream)
}
```
### Client (React)
```tsx
import { useGenerateImage, fetchServerSentEvents } from '@tanstack/ai-react'
import { useState } from 'react'
function ImageGenerator() {
const [prompt, setPrompt] = useState('')
const { generate, result, isLoading, error, reset } = useGenerateImage({
connection: fetchServerSentEvents('/api/generate/image'),
})
return (
<div>
<input
value={prompt}
onChange={(e) => setPrompt(e.target.value)}
placeholder="Describe an image..."
/>
<button
onClick={() => generate({ prompt })}
disabled={isLoading || !prompt.trim()}
>
{isLoading ? 'Generating...' : 'Generate'}
</button>
{error && <p>Error: {error.message}</p>}
{result?.images.map((img, i) => (
<img
key={i}
src={img.url || `data:image/png;base64,${img.b64Json}`}
alt={img.revisedPrompt || 'Generated image'}
/>
))}
{result && <button onClick={reset}>Clear</button>}
</div>
)
}
```
### TanStack Start: Server Function Streaming (recommended)
When using TanStack Start, return `toServerSentEventsResponse()` from a
server function. The client fetcher receives a `Response` and the hook
parses it as SSE automatically:
```typescript
// lib/server-functions.ts
import { createServerFn } from '@tanstack/react-start'
import { generateImage, toServerSentEventsResponse } from '@tanstack/ai'
import { openaiImage } from '@tanstack/ai-openai'
export const generateImageStreamFn = createServerFn({ method: 'POST' })
.inputValidator((data: { prompt: string; model?: string }) => data)
.handler(({ data }) => {
return toServerSentEventsResponse(
generateImage({
adapter: openaiImage(data.model ?? 'gpt-image-1'),
prompt: data.prompt,
stream: true,
}),
)
})
```
```tsx
import { useGenerateImage } from '@tanstack/ai-react'
import { generateImageStreamFn } from '../lib/server-functions'
function ImageGenerator() {
const { generate, result, isLoading } = useGenerateImage({
fetcher: (input) => generateImageStreamFn({ data: input }),
})
return (
<button
onClick={() => generate({ prompt: 'A sunset over mountains' })}
disabled={isLoading}
>
{isLoading ? 'Generating...' : 'Generate'}
</button>
)
}
```
---
## Core Patterns
### 1. Image Generation
Supported adapters: `openaiImage` (dall-e-2, dall-e-3, gpt-image-1,
gpt-image-1-mini, gpt-image-2) and `geminiImage` (gemini-3.1-flash-image-preview,
imagen-4.0-generate-001, etc.).
```typescript
import { generateImage } from '@tanstack/ai'
import { openaiImage } from '@tanstack/ai-openai'
import { geminiImage } from '@tanstack/ai-gemini'
// OpenAI with quality/background options
const openaiResult = await generateImage({
adapter: openaiImage('gpt-image-1'),
prompt: 'A cat wearing a hat',
size: '1024x1024',
numberOfImages: 2,
modelOptions: {
quality: 'high',
background: 'transparent',
outputFormat: 'png',
},
})
// Gemini native model with aspect-ratio sizes
const geminiResult = await generateImage({
adapter: geminiImage('gemini-3.1-flash-image-preview'),
prompt: 'A futuristic cityscape at night',
size: '16:9_4K',
})
// Gemini Imagen model
const imagenResult = await generateImage({
adapter: geminiImage('imagen-4.0-generate-001'),
prompt: 'A landscape photo',
modelOptions: { aspectRatio: '16:9' },
})
```
Result shape: `ImageGenerationResult` with `images` array where each entry
has `b64Json?`, `url?`, and `revisedPrompt?`. OpenAI image URLs expire
after 1 hour -- download or display immediately.
### 2. Audio Generation (Music, Sound Effects)
Distinct from TTS — `generateAudio()` produces non-speech audio content.
Supported adapters: `geminiAudio` (Lyria 3 Pro / Lyria 3 Clip) and
`falAudio` (MiniMax Music, DiffRhythm, Stable Audio, ElevenLabs SFX, etc.).
```typescript
import { generateAudio } from '@tanstack/ai'
import { falAudio } from '@tanstack/ai-fal'
const result = await generateAudio({
adapter: falAudio('fal-ai/diffrhythm'),
prompt: 'An upbeat electronic track with synths',
duration: 10,
})
// result.audio.url or result.audio.b64Json (provider-dependent)
// result.audio.contentType e.g. "audio/mpeg"
```
Client hook:
```tsx
import { useGenerateAudio, fetchServerSentEvents } from '@tanstack/ai-react'
const { generate, result, isLoading } = useGenerateAudio({
connection: fetchServerSentEvents('/api/generate/audio'),
})
// Trigger: generate({ prompt: 'Upbeat synths', duration: 10 })
// Play: <audio src={result.audio.url} controls />
```
### 3. Text-to-Speech
Adapter: `openaiSpeech` (tts-1, tts-1-hd, gpt-4o-audio-preview).
```typescript
import { generateSpeech } from '@tanstack/ai'
import { openaiSpeech } from '@tanstack/ai-openai'
const result = await generateSpeech({
adapter: openaiSpeech('tts-1-hd'),
text: 'Hello, welcome to TanStack AI!',
voice: 'alloy', // alloy | echo | fable | onyx | nova | shimmer | ash | ballad | coral | sage | verse
format: 'mp3', // mp3 | opus | aac | flac>
Triage all open GitHub issues, PRs, and discussions in the current repository by fanning out up to 100 parallel subagents (one per item), then produce a single prioritized report ranking which PRs to review first, which issues to address first, and which discussions need maintainer attention. Use when the user asks to "triage open issues/PRs", "triage discussions", "prioritize the backlog", "what should I review first", "sweep the repo", or any request to bulk-evaluate open GitHub work and recommend an order.
>
>
>
>
>
>