Skill118 estrellas del repoactualizado 1mo ago

dspy-adapters-multimodal

This skill guides developers through selecting and configuring DSPy adapters for different output formats (ChatAdapter for human-readable text, JSONAdapter for structured JSON with native function calling, XMLAdapter for XML-tagged fields) and implementing multimodal inputs using typed primitives like dspy.Image, dspy.Audio, and dspy.File. Use it when building DSPy programs that require structured outputs, native function calling, or need to process images, audio, or file uploads.

Ver fuente Repositorio: dspy-skills

Instalar en Claude Code

Copiar

git clone --depth 1 https://github.com/OmidZamani/dspy-skills /tmp/dspy-adapters-multimodal && cp -r /tmp/dspy-adapters-multimodal/skills/dspy-adapters-multimodal ~/.claude/skills/dspy-adapters-multimodal

Después abre una sesión nueva de Claude Code; el skill carga automáticamente.

Definición

SKILL.md

# DSPy Adapters and Multimodal I/O

## Goal

Choose an adapter deliberately and model image, audio, and file inputs with DSPy's typed primitives.

## Adapter Selection

| Adapter | Use it for |
|---------|------------|
| `dspy.ChatAdapter()` | Default, human-readable field markers, broad model compatibility |
| `dspy.JSONAdapter()` | Structured JSON output and native function calling where supported |
| `dspy.XMLAdapter()` | XML-tagged fields when XML is easier for the target LM to follow |
| `dspy.TwoStepAdapter()` | A separate extraction pass when parsing needs extra help |

Configure globally or for a limited scope:

```python
import dspy

dspy.configure(
    lm=dspy.LM("openai/gpt-4o-mini"),
    adapter=dspy.JSONAdapter(),
)

with dspy.context(adapter=dspy.XMLAdapter()):
    result = dspy.Predict("question -> answer")(question="What is DSPy?")
```

## Native Function Calling

`JSONAdapter` enables native function calling by default. `ChatAdapter` keeps text parsing by default. Override either behavior explicitly:

```python
chat_native = dspy.ChatAdapter(use_native_function_calling=True)
json_manual = dspy.JSONAdapter(use_native_function_calling=False)
```

DSPy falls back to manual parsing when the configured LM does not support native function calling.

## Image Inputs

```python
class DescribeImage(dspy.Signature):
    image: dspy.Image = dspy.InputField()
    description: str = dspy.OutputField()

describe = dspy.Predict(DescribeImage)
result = describe(image=dspy.Image("./diagram.png"))
```

Pass a local path, HTTP URL, bytes, PIL image, or existing data URI directly to `dspy.Image(...)`.

## Audio and File Inputs

```python
class SummarizeAudio(dspy.Signature):
    audio: dspy.Audio = dspy.InputField()
    summary: str = dspy.OutputField()

audio = dspy.Audio.from_file("./meeting.wav")
summary = dspy.Predict(SummarizeAudio)(audio=audio)
```

```python
class SummarizeFile(dspy.Signature):
    file: dspy.File = dspy.InputField()
    summary: str = dspy.OutputField()

document = dspy.File.from_path("./research.pdf")
summary = dspy.Predict(SummarizeFile)(file=document)
```

Provider capabilities vary. Verify that the selected model accepts the media type before deployment.

## Best Practices

1. Start with `ChatAdapter`; switch only for a measured reason.
2. Use typed signatures for structured output.
3. Test adapter behavior against the exact production model.
4. Avoid deprecated `Image.from_file()` and `Image.from_url()` helpers; call `dspy.Image(...)`.
5. Keep local file handling and uploaded file IDs within provider policy.

## Related Skills

- Design signatures: [dspy-signature-designer](../dspy-signature-designer/SKILL.md)
- Build tool agents: [dspy-react-agent-builder](../dspy-react-agent-builder/SKILL.md)

## Official Documentation

- **Adapters guide**: https://dspy.ai/learn/programming/adapters/
- **Tools guide**: https://dspy.ai/learn/programming/tools/
- **XMLAdapter API**: https://dspy.ai/api/adapters/XMLAdapter/
- **Image API**: https://dspy.ai/api/primitives/Image/
- **Audio API**: https://dspy.ai/api/primitives/Audio/

Del mismo repositorio

skill-perfectionSkill

Use this skill when you need to QA audit and fix a plugin skill file. Provides a methodology for verifying skill content against official documentation, fixing issues in-place, and producing verification reports.

dspy-advanced-module-compositionSkill

Use for composing DSPy modules with Ensemble, MultiChainComparison, ensemble voting, sequential pipelines, and multi-program workflows.

dspy-better-togetherSkill

Use for BetterTogether, prompt plus weight optimization, fine-tuning sequences, and strategy chains like p -> w -> p.

dspy-bootstrap-fewshotSkill

Use for BootstrapFewShot, bootstrapped demonstrations, teacher-model demos, and low-data DSPy prompt optimization.

dspy-custom-module-designSkill

Use for creating custom DSPy modules, extending dspy.Module, reusable components, stateful modules, serialization, and module testing.

dspy-debugging-observabilitySkill

Use for debugging DSPy programs, inspect_history, tracing LLM calls, custom callbacks, observability, monitoring, and cost tracking.

dspy-embedding-retrievalSkill

Use for DSPy retrieval with dspy.Embedder, dspy.Embeddings, FAISS indexes, semantic search, and local or hosted embedding models.

dspy-evaluation-suiteSkill

Use for evaluating DSPy programs with Evaluate, answer_exact_match, SemanticF1, custom metrics, baselines, and program comparisons.