dspy-bootstrap-fewshot
This skill should be used when the user asks to "bootstrap few-shot examples", "generate demonstrations", "use BootstrapFewShot", "optimize with limited data", "create training demos automatically", mentions "teacher model for few-shot", "10-50 training examples", or wants automatic demonstration generation for a DSPy program without extensive compute.
git clone --depth 1 https://github.com/OmidZamani/dspy-skills /tmp/dspy-bootstrap-fewshot && cp -r /tmp/dspy-bootstrap-fewshot/skills/dspy-bootstrap-fewshot ~/.claude/skills/dspy-bootstrap-fewshotSKILL.md
# DSPy Bootstrap Few-Shot Optimizer
## Goal
Automatically generate and select optimal few-shot demonstrations for your DSPy program using a teacher model.
## When to Use
- You have **10-50 labeled examples**
- Manual example selection is tedious or suboptimal
- You want demonstrations with reasoning traces
- Quick optimization without extensive compute
## Related Skills
- For more data (200+ examples): [dspy-miprov2-optimizer](../dspy-miprov2-optimizer/SKILL.md)
- For agentic systems: [dspy-gepa-reflective](../dspy-gepa-reflective/SKILL.md)
- Measure improvements: [dspy-evaluation-suite](../dspy-evaluation-suite/SKILL.md)
## Inputs
| Input | Type | Description |
|-------|------|-------------|
| `program` | `dspy.Module` | Your DSPy program to optimize |
| `trainset` | `list[dspy.Example]` | Training examples |
| `metric` | `callable` | Evaluation function |
| `metric_threshold` | `float` | Numerical threshold for accepting demos (optional) |
| `max_bootstrapped_demos` | `int` | Max teacher-generated demos (default: 4) |
| `max_labeled_demos` | `int` | Max direct labeled demos (default: 16) |
| `max_rounds` | `int` | Max bootstrapping attempts per example (default: 1) |
| `teacher_settings` | `dict` | Configuration for teacher model (optional) |
## Outputs
| Output | Type | Description |
|--------|------|-------------|
| `compiled_program` | `dspy.Module` | Optimized program with demos |
## Workflow
### Phase 1: Setup
```python
import dspy
from dspy.teleprompt import BootstrapFewShot
# Configure LMs
dspy.configure(lm=dspy.LM("openai/gpt-4o-mini"))
```
### Phase 2: Define Program and Metric
```python
class QA(dspy.Module):
def __init__(self):
self.generate = dspy.ChainOfThought("question -> answer")
def forward(self, question):
return self.generate(question=question)
def validate_answer(example, pred, trace=None):
return example.answer.lower() in pred.answer.lower()
```
### Phase 3: Compile
```python
optimizer = BootstrapFewShot(
metric=validate_answer,
max_bootstrapped_demos=4,
max_labeled_demos=4,
teacher_settings={'lm': dspy.LM("openai/gpt-4o")}
)
compiled_qa = optimizer.compile(QA(), trainset=trainset)
```
### Phase 4: Use and Save
```python
# Use optimized program
result = compiled_qa(question="What is photosynthesis?")
# Save for production (state-only, recommended)
compiled_qa.save("qa_optimized.json", save_program=False)
```
## Production Example
```python
import dspy
from dspy.teleprompt import BootstrapFewShot
from dspy.evaluate import Evaluate
import logging
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)
class ProductionQA(dspy.Module):
def __init__(self):
self.cot = dspy.ChainOfThought("question -> answer")
def forward(self, question: str):
try:
return self.cot(question=question)
except Exception as e:
logger.error(f"Generation failed: {e}")
return dspy.Prediction(answer="Unable to answer")
def robust_metric(example, pred, trace=None):
if not pred.answer or pred.answer == "Unable to answer":
return 0.0
return float(example.answer.lower() in pred.answer.lower())
def optimize_with_bootstrap(trainset, devset):
"""Full optimization pipeline with validation."""
# Baseline
baseline = ProductionQA()
evaluator = Evaluate(devset=devset, metric=robust_metric, num_threads=4)
baseline_score = evaluator(baseline)
logger.info(f"Baseline: {baseline_score:.2%}")
# Optimize
optimizer = BootstrapFewShot(
metric=robust_metric,
max_bootstrapped_demos=4,
max_labeled_demos=4
)
compiled = optimizer.compile(baseline, trainset=trainset)
optimized_score = evaluator(compiled)
logger.info(f"Optimized: {optimized_score:.2%}")
if optimized_score > baseline_score:
compiled.save("production_qa.json", save_program=False)
return compiled
logger.warning("Optimization didn't improve; keeping baseline")
return baseline
```
## Best Practices
1. **Quality over quantity** - 10 excellent examples beat 100 noisy ones
2. **Use stronger teacher** - GPT-4 as teacher for GPT-3.5 student
3. **Validate with held-out set** - Always test on unseen data
4. **Start with 4 demos** - More isn't always better
## Limitations
- Requires labeled training data
- Teacher model costs can add up
- May not generalize to very different inputs
- Limited exploration compared to MIPROv2
## Official Documentation
- **DSPy Documentation**: https://dspy.ai/
- **DSPy GitHub**: https://github.com/stanfordnlp/dspy
- **BootstrapFewShot API**: https://dspy.ai/api/optimizers/BootstrapFewShot/
- **Optimization Guide**: https://dspy.ai/learn/optimization/optimizers/Use this skill when you need to QA audit and fix a plugin skill file. Provides a methodology for verifying skill content against official documentation, fixing issues in-place, and producing verification reports.
This skill should be used when the user asks to "choose a DSPy adapter", "use JSONAdapter", "use XMLAdapter", "enable native function calling", "send images, audio, or files to DSPy", mentions `dspy.ChatAdapter`, `dspy.JSONAdapter`, `dspy.XMLAdapter`, `dspy.Image`, `dspy.Audio`, `dspy.File`, structured outputs, or multimodal DSPy signatures.
This skill should be used when the user asks to "compose DSPy modules", "use Ensemble optimizer", "combine multiple programs", "use dspy.MultiChainComparison", mentions "ensemble voting", "module composition", "sequential pipelines", or needs to build complex multi-module DSPy programs with ensemble patterns or multi-chain comparison.
This skill should be used when the user asks to "use BetterTogether", "combine prompt optimization and fine-tuning", "sequence DSPy optimizers", "run prompt then weight optimization", mentions `dspy.BetterTogether`, strategy strings such as "p -> w -> p", or needs to compose multiple DSPy teleprompters into an evaluated optimization sequence.
This skill should be used when the user asks to "create custom DSPy module", "design a DSPy module", "extend dspy.Module", "build reusable DSPy component", mentions "custom module patterns", "module serialization", "stateful modules", "module testing", or needs to design production-quality custom DSPy modules with proper architecture, state management, and testing.
This skill should be used when the user asks to "debug DSPy programs", "trace LLM calls", "monitor production DSPy", "use MLflow with DSPy", mentions "inspect_history", "custom callbacks", "observability", "production monitoring", "cost tracking", or needs to debug, trace, and monitor DSPy applications in development and production.
This skill should be used when the user asks to "build local DSPy retrieval", "use dspy.Embedder", "use dspy.Embeddings", "save an embeddings index", "add FAISS retrieval", mentions semantic search, hosted embeddings, local embedding models, `EmbeddingsWithScores`, or needs a DSPy retriever over an application-owned text corpus.
This skill should be used when the user asks to "evaluate a DSPy program", "test my DSPy module", "measure performance", "create evaluation metrics", "use answer_exact_match or SemanticF1", mentions "Evaluate class", "comparing programs", "establishing baselines", or needs to systematically test and measure DSPy program quality with custom or built-in metrics.