Skill118 estrellas del repoactualizado 1mo ago

dspy-simba-optimizer

dspy-simba-optimizer is a Claude Code skill for optimizing DSPy programs using stochastic mini-batch sampling and self-reflective rule generation. Use it when you need a budget-conscious alternative to heavier optimizers like MIPROv2 or GEPA, have a numeric quality metric, and want to improve program performance through introspective demonstrations and evolved rules with fewer evaluation calls.

Ver fuente Repositorio: dspy-skills

Instalar en Claude Code

Copiar

git clone --depth 1 https://github.com/OmidZamani/dspy-skills /tmp/dspy-simba-optimizer && cp -r /tmp/dspy-simba-optimizer/skills/dspy-simba-optimizer ~/.claude/skills/dspy-simba-optimizer

Después abre una sesión nueva de Claude Code; el skill carga automáticamente.

Definición

SKILL.md

# DSPy SIMBA Optimizer

## Goal

Optimize DSPy programs using stochastic mini-batch sampling, output variability, self-reflective rules, and successful demonstrations.

## When to Use

- Need lighter-weight alternative to GEPA
- Have a numeric metric that captures task quality
- Want introspective rules and demonstrations
- Budget-conscious optimization (fewer eval calls)
- Programs where few-shot examples aren't critical

## Related Skills

- Alternative optimizers: [dspy-miprov2-optimizer](../dspy-miprov2-optimizer/SKILL.md), [dspy-gepa-reflective](../dspy-gepa-reflective/SKILL.md)
- Agent optimization: [dspy-react-agent-builder](../dspy-react-agent-builder/SKILL.md)
- Evaluation: [dspy-evaluation-suite](../dspy-evaluation-suite/SKILL.md)

## Inputs

| Input | Type | Description |
|-------|------|-------------|
| `program` | `dspy.Module` | Program to optimize |
| `trainset` | `list[dspy.Example]` | Training examples |
| `metric` | `callable` | Returns a numeric score |
| `max_steps` | `int` | Number of optimization steps |
| `bsize` | `int` | Mini-batch size |

## Outputs

| Output | Type | Description |
|--------|------|-------------|
| `optimized_program` | `dspy.Module` | SIMBA-optimized program |

## Workflow

### Phase 1: Understand SIMBA

**SIMBA** (Stochastic Introspective Mini-Batch Ascent):
- Iterative prompt optimization with mini-batch sampling
- Identifies challenging examples with high output variability
- Generates self-reflective rules or adds successful demonstrations
- Uses the configured LM or `prompt_model` for introspection
- More exploratory than basic bootstrap optimization

**Comparison:**
- **MIPROv2**: Best accuracy, lots of data
- **GEPA**: Agentic systems, expensive
- **SIMBA**: Mini-batch introspection, budget-friendly
- **Bootstrap**: Simplest, demo-based

### Phase 2: Basic SIMBA Optimization

```python
import dspy

dspy.configure(lm=dspy.LM("openai/gpt-4o-mini"))

# Program to optimize
class QAPipeline(dspy.Module):
    def __init__(self):
        self.generate = dspy.ChainOfThought("question -> answer")

    def forward(self, question):
        return self.generate(question=question)

# Metric returns a numeric score
def qa_metric(example, pred, trace=None):
    correct = example.answer.lower() in pred.answer.lower()
    return 1.0 if correct else 0.0

# SIMBA optimizer
optimizer = dspy.SIMBA(
    metric=qa_metric,
    max_steps=10,  # Optimization iterations
    bsize=5  # Mini-batch size
)

program = QAPipeline()
compiled = optimizer.compile(program, trainset=trainset)
compiled.save("qa_simba.json")
```

### Phase 3: SIMBA with a Nuanced Numeric Metric

Use a graded numeric metric when exact match is too coarse:

```python
import dspy

def detailed_metric(example, pred, trace=None):
    """Return a graded numeric score."""
    expected = example.answer.lower()
    actual = pred.answer.lower()

    if expected == actual:
        return 1.0
    elif expected in actual:
        return 0.7
    else:
        overlap = len(set(expected.split()) & set(actual.split()))
        if overlap > 0:
            return 0.3
        return 0.0

optimizer = dspy.SIMBA(
    metric=detailed_metric,
    max_steps=20,  # Optimization iterations
    bsize=8  # Mini-batch size
)

compiled = optimizer.compile(program, trainset=trainset)
```

### Phase 4: Production Agent Optimization

```python
import dspy
from dspy.evaluate import Evaluate
import logging

logger = logging.getLogger(__name__)

# Define tools as functions
def search(query: str) -> str:
    """Search knowledge base for relevant information."""
    retriever = dspy.ColBERTv2(url='http://20.102.90.50:2017/wiki17_abstracts')
    results = retriever(query, k=3)
    return "\n".join([r['text'] for r in results])

def calculate(expr: str) -> str:
    """Evaluate Python expressions safely."""
    try:
        with dspy.PythonInterpreter() as interp:
            return str(interp.execute(expr))
    except Exception as e:
        return f"Error: {e}"

class ResearchAgent(dspy.Module):
    def __init__(self):
        self.agent = dspy.ReAct(
            "question -> answer",
            tools=[search, calculate]
        )

    def forward(self, question):
        return self.agent(question=question)

def agent_metric(example, pred, trace=None):
    """Numeric metric for agent optimization."""
    expected = example.answer.lower().strip()
    actual = pred.answer.lower().strip() if pred.answer else ""

    # Exact match
    if expected == actual:
        return 1.0

    # Partial match
    if expected in actual:
        return 0.7

    # Check key terms
    expected_terms = set(expected.split())
    actual_terms = set(actual.split())
    overlap = len(expected_terms & actual_terms)

    if overlap >= len(expected_terms) * 0.5:
        return 0.5

    return 0.0

def optimize_agent(trainset, devset):
    """Full SIMBA optimization pipeline."""
    dspy.configure(lm=dspy.LM("openai/gpt-4o-mini"))

    agent = ResearchAgent()

    # Baseline evaluation
    evaluator = dspy.Evaluate(devset=devset, metric=agent_metric, num_threads=4)
    baseline = evaluator(agent)
    logger.info(f"Baseline: {baseline:.2%}")

    # SIMBA optimization
    optimizer = dspy.SIMBA(
        metric=agent_metric,
        max_steps=25,  # Optimization iterations
        bsize=6  # Mini-batch size
    )

    compiled = optimizer.compile(agent, trainset=trainset)

    # Evaluate optimized
    optimized = evaluator(compiled)
    logger.info(f"SIMBA optimized: {optimized:.2%}")

    compiled.save("research_agent_simba.json")
    return compiled
```

## Configuration

```python
optimizer = dspy.SIMBA(
    metric=metric_fn,
    max_steps=20,                          # Optimization iterations
    bsize=32,                              # Mini-batch size (default: 32)
    num_candidates=6,                      # Candidates per iteration (default: 6)
    max_demos=4,                           # Max demos per predictor (default: 4)
    temperature_for_sampling=

Del mismo repositorio

skill-perfectionSkill

Use this skill when you need to QA audit and fix a plugin skill file. Provides a methodology for verifying skill content against official documentation, fixing issues in-place, and producing verification reports.

dspy-adapters-multimodalSkill

Use for DSPy adapter selection, JSONAdapter, XMLAdapter, ChatAdapter, native function calling, structured outputs, and multimodal inputs like dspy.Image or dspy.Audio.

dspy-advanced-module-compositionSkill

Use for composing DSPy modules with Ensemble, MultiChainComparison, ensemble voting, sequential pipelines, and multi-program workflows.

dspy-better-togetherSkill

Use for BetterTogether, prompt plus weight optimization, fine-tuning sequences, and strategy chains like p -> w -> p.

dspy-bootstrap-fewshotSkill

Use for BootstrapFewShot, bootstrapped demonstrations, teacher-model demos, and low-data DSPy prompt optimization.

dspy-custom-module-designSkill

Use for creating custom DSPy modules, extending dspy.Module, reusable components, stateful modules, serialization, and module testing.

dspy-debugging-observabilitySkill

Use for debugging DSPy programs, inspect_history, tracing LLM calls, custom callbacks, observability, monitoring, and cost tracking.

dspy-embedding-retrievalSkill

Use for DSPy retrieval with dspy.Embedder, dspy.Embeddings, FAISS indexes, semantic search, and local or hosted embedding models.