dspy-production-deployment
This skill should be used when the user asks to "deploy DSPy", "save and load a DSPy program", "configure DSPy cache", "harden pickle cache", "track DSPy token usage", "run DSPy asynchronously", "stream DSPy output", mentions `configure_cache`, `restrict_pickle`, `track_usage`, `acall`, `asyncify`, `streamify`, `StreamListener`, MLflow deployment, or needs production runtime guidance for a DSPy application.
git clone --depth 1 https://github.com/OmidZamani/dspy-skills /tmp/dspy-production-deployment && cp -r /tmp/dspy-production-deployment/skills/dspy-production-deployment ~/.claude/skills/dspy-production-deploymentSKILL.md
# DSPy Production Deployment
## Goal
Prepare a DSPy program for repeatable, observable, scalable, and safer production execution.
## Cache Hardening
DSPy enables memory and disk caches by default. Disk cache deserialization uses pickle unless restricted. Enable the allowlist mode in production:
```python
import dspy
dspy.configure_cache(restrict_pickle=True)
```
Register trusted custom cache types only when needed:
```python
dspy.configure_cache(
restrict_pickle=True,
safe_types=[MyResult, Metadata],
)
```
Disable a cache layer explicitly when a deployment cannot persist data or requires fresh model responses:
```python
dspy.configure_cache(
enable_disk_cache=False,
enable_memory_cache=True,
)
```
## Save and Load
Prefer state-only JSON for readable, safer artifacts:
```python
compiled.save("./artifacts/program.json", save_program=False)
loaded = MyProgram()
loaded.load("./artifacts/program.json")
```
Use whole-program save only for trusted artifacts. It uses cloudpickle:
```python
compiled.save("./artifacts/program/", save_program=True)
loaded = dspy.load("./artifacts/program/")
```
Keep the DSPy major version compatible when loading saved programs.
## Usage Tracking
```python
dspy.configure(
lm=dspy.LM("openai/gpt-4o-mini"),
track_usage=True,
)
prediction = program(question="What is DSPy?")
print(prediction.get_lm_usage())
```
Cached calls return no new token usage.
## Async Execution
Most built-in modules support `acall()`:
```python
import asyncio
async def main():
prediction = await program.acall(question="What is DSPy?")
print(prediction.answer)
asyncio.run(main())
```
Implement `aforward()` for custom async modules. Use `dspy.asyncify(program)` only when adapting a synchronous callable is the right boundary.
## Streaming
```python
import asyncio
import dspy
stream_program = dspy.streamify(
dspy.Predict("question -> answer"),
stream_listeners=[
dspy.streaming.StreamListener(signature_field_name="answer"),
],
)
async def main():
async for chunk in stream_program(question="Explain DSPy briefly."):
print(chunk)
asyncio.run(main())
```
For looped modules such as ReAct, set `allow_reuse=True` on listeners for repeated fields. Cache hits yield the final `Prediction` without replaying token chunks.
## Production Checklist
1. Pin the stable DSPy series.
2. Use state-only JSON unless whole-program pickle is necessary and trusted.
3. Enable `restrict_pickle=True`.
4. Record usage, latency, errors, and traces.
5. Load-test async and streaming paths separately.
6. Use [dspy-debugging-observability](../dspy-debugging-observability/SKILL.md) for MLflow and callbacks.
## Official Documentation
- **Production guide**: https://dspy.ai/production/
- **Cache tutorial**: https://dspy.ai/tutorials/cache/
- **Saving tutorial**: https://dspy.ai/tutorials/saving/
- **Async tutorial**: https://dspy.ai/tutorials/async/
- **Streaming tutorial**: https://dspy.ai/tutorials/streaming/Use this skill when you need to QA audit and fix a plugin skill file. Provides a methodology for verifying skill content against official documentation, fixing issues in-place, and producing verification reports.
This skill should be used when the user asks to "choose a DSPy adapter", "use JSONAdapter", "use XMLAdapter", "enable native function calling", "send images, audio, or files to DSPy", mentions `dspy.ChatAdapter`, `dspy.JSONAdapter`, `dspy.XMLAdapter`, `dspy.Image`, `dspy.Audio`, `dspy.File`, structured outputs, or multimodal DSPy signatures.
This skill should be used when the user asks to "compose DSPy modules", "use Ensemble optimizer", "combine multiple programs", "use dspy.MultiChainComparison", mentions "ensemble voting", "module composition", "sequential pipelines", or needs to build complex multi-module DSPy programs with ensemble patterns or multi-chain comparison.
This skill should be used when the user asks to "use BetterTogether", "combine prompt optimization and fine-tuning", "sequence DSPy optimizers", "run prompt then weight optimization", mentions `dspy.BetterTogether`, strategy strings such as "p -> w -> p", or needs to compose multiple DSPy teleprompters into an evaluated optimization sequence.
This skill should be used when the user asks to "bootstrap few-shot examples", "generate demonstrations", "use BootstrapFewShot", "optimize with limited data", "create training demos automatically", mentions "teacher model for few-shot", "10-50 training examples", or wants automatic demonstration generation for a DSPy program without extensive compute.
This skill should be used when the user asks to "create custom DSPy module", "design a DSPy module", "extend dspy.Module", "build reusable DSPy component", mentions "custom module patterns", "module serialization", "stateful modules", "module testing", or needs to design production-quality custom DSPy modules with proper architecture, state management, and testing.
This skill should be used when the user asks to "debug DSPy programs", "trace LLM calls", "monitor production DSPy", "use MLflow with DSPy", mentions "inspect_history", "custom callbacks", "observability", "production monitoring", "cost tracking", or needs to debug, trace, and monitor DSPy applications in development and production.
This skill should be used when the user asks to "build local DSPy retrieval", "use dspy.Embedder", "use dspy.Embeddings", "save an embeddings index", "add FAISS retrieval", mentions semantic search, hosted embeddings, local embedding models, `EmbeddingsWithScores`, or needs a DSPy retriever over an application-owned text corpus.