Skip to main content
ClaudeWave
Skill78 repo starsupdated 11d ago

dspy-optimize-anything

Universal text artifact optimizer using GEPA's optimize_anything API for code, prompts, agent architectures, configs, and more

Install in Claude Code
Copy
git clone --depth 1 https://github.com/OmidZamani/dspy-skills /tmp/dspy-optimize-anything && cp -r /tmp/dspy-optimize-anything/skills/dspy-optimize-anything ~/.claude/skills/dspy-optimize-anything
Then start a new Claude Code session; the skill loads automatically.

SKILL.md

# GEPA optimize_anything

## Goal

Optimize any artifact representable as text — code, prompts, agent architectures, vector graphics, configurations — using a single declarative API powered by GEPA's reflective evolutionary search.

## When to Use

- **Beyond prompt optimization** — optimizing code, configs, SVGs, scheduling policies, etc.
- **Single hard problems** — circle packing, kernel generation, algorithm discovery
- **Batch related problems** — CUDA kernels, code generation tasks with cross-transfer
- **Generalization** — agent skills, policies, or prompts that must transfer to unseen inputs
- When you can **express quality as a score** and provide **diagnostic feedback** (ASI)

## Inputs

| Input | Type | Description |
|-------|------|-------------|
| `seed_candidate` | `str \| dict[str, str] \| None` | Starting artifact text, or `None` for seedless mode |
| `evaluator` | `Callable` | Returns score (higher=better), optionally with ASI dict |
| `dataset` | `list \| None` | Training examples (for multi-task and generalization modes) |
| `valset` | `list \| None` | Validation set (for generalization mode) |
| `objective` | `str \| None` | Natural language description of what to optimize for |
| `background` | `str \| None` | Domain knowledge and constraints |
| `config` | `GEPAConfig \| None` | Engine, reflection, and tracking settings |

## Outputs

| Output | Type | Description |
|--------|------|-------------|
| `result.best_candidate` | `str \| dict` | Best optimized artifact |

## Workflow

### Phase 1: Install

```bash
pip install -U "gepa>=0.1.1,<0.2"
```

### Phase 2: Define Evaluator with ASI

The evaluator scores a candidate and returns Actionable Side Information (ASI) — diagnostic feedback that guides the LLM proposer during reflection.

**Simple evaluator (score only):**

```python
import gepa.optimize_anything as oa
from gepa.optimize_anything import EngineConfig, GEPAConfig

config = GEPAConfig(engine=EngineConfig(max_metric_calls=100))

def evaluate(candidate: str) -> float:
    score, diagnostic = run_my_system(candidate)
    oa.log(f"Error: {diagnostic}")  # captured as ASI
    return score
```

**Rich evaluator (score + structured ASI):**

```python
def evaluate(candidate: str) -> tuple[float, dict]:
    result = execute_code(candidate)
    return result.score, {
        "Error": result.stderr,
        "Output": result.stdout,
        "Runtime": f"{result.time_ms:.1f}ms",
    }
```

ASI can include open-ended text, structured data, multi-objectives (via `scores`), or images (via `gepa.Image`) for vision-capable LLMs.

### Phase 3: Choose Optimization Mode

**Mode 1 — Single-Task Search:** Solve one hard problem. No dataset needed.

```python
result = oa.optimize_anything(
    seed_candidate="<your initial artifact>",
    evaluator=evaluate,
    config=config,
)
```

**Mode 2 — Multi-Task Search:** Solve a batch of related problems with cross-transfer.

```python
result = oa.optimize_anything(
    seed_candidate="<your initial artifact>",
    evaluator=evaluate,
    dataset=tasks,
    config=config,
)
```

**Mode 3 — Generalization:** Build a skill/prompt/policy that transfers to unseen problems.

```python
result = oa.optimize_anything(
    seed_candidate="<your initial artifact>",
    evaluator=evaluate,
    dataset=train,
    valset=val,
    config=config,
)
```

**Seedless mode:** Describe what you need instead of providing a seed.

```python
result = oa.optimize_anything(
    evaluator=evaluate,
    objective="Generate a Python function `reverse()` that reverses a string.",
    config=config,
)
```

### Phase 4: Use Results

```python
print(result.best_candidate)
```

## Production Example

```python
import gepa.optimize_anything as oa
from gepa import Image
from gepa.optimize_anything import EngineConfig, GEPAConfig
import logging

logger = logging.getLogger(__name__)

# ---------- SVG optimization with VLM feedback ----------

GOAL = "a pelican riding a bicycle"
VLM = "vertex_ai/gemini-3-flash-preview"

VISUAL_ASPECTS = [
    {"id": "overall",     "criteria": f"Rate overall quality of this SVG ({GOAL}). SCORE: X/10"},
    {"id": "anatomy",     "criteria": "Rate pelican accuracy: beak, pouch, plumage. SCORE: X/10"},
    {"id": "bicycle",     "criteria": "Rate bicycle: wheels, frame, handlebars, pedals. SCORE: X/10"},
    {"id": "composition", "criteria": "Rate how convincingly the pelican rides the bicycle. SCORE: X/10"},
]

def evaluate(candidate, example):
    """Render SVG, score with a VLM, return (score, ASI)."""
    image = render_image(candidate["svg_code"])  # via cairosvg
    score, feedback = get_vlm_score_feedback(VLM, image, example["criteria"])

    return score, {
        "RenderedSVG": Image(base64_data=image, media_type="image/png"),
        "Feedback": feedback,
    }

result = oa.optimize_anything(
    seed_candidate={"svg_code": "<svg>...</svg>"},
    evaluator=evaluate,
    dataset=VISUAL_ASPECTS,
    background=f"Optimize SVG source code depicting '{GOAL}'. "
               "Improve anatomy, composition, and visual quality.",
    config=GEPAConfig(engine=EngineConfig(max_metric_calls=100)),
)

logger.info(f"Best SVG:\n{result.best_candidate['svg_code']}")


# ---------- Code optimization (single-task) ----------

def evaluate_solver(candidate: str) -> tuple[float, dict]:
    """Evaluate a Python solver for a mathematical optimization problem."""
    import subprocess, json

    proc = subprocess.run(
        ["python", "-c", candidate],
        capture_output=True, text=True, timeout=30,
    )

    if proc.returncode != 0:
        oa.log(f"Runtime error: {proc.stderr}")
        return 0.0, {"Error": proc.stderr}

    try:
        output = json.loads(proc.stdout)
        return output["score"], {
            "Output": output.get("solution"),
            "Runtime": f"{output.get('time_ms', 0):.1f}ms",
        }
    except (json.JSONDecodeError, KeyError) as e:
        oa.log(f"Parse error: {e}")
        return 0.0, {"Error": str(e), "Stdo
skill-perfectionSkill

Use this skill when you need to QA audit and fix a plugin skill file. Provides a methodology for verifying skill content against official documentation, fixing issues in-place, and producing verification reports.

dspy-adapters-multimodalSkill

This skill should be used when the user asks to "choose a DSPy adapter", "use JSONAdapter", "use XMLAdapter", "enable native function calling", "send images, audio, or files to DSPy", mentions `dspy.ChatAdapter`, `dspy.JSONAdapter`, `dspy.XMLAdapter`, `dspy.Image`, `dspy.Audio`, `dspy.File`, structured outputs, or multimodal DSPy signatures.

dspy-advanced-module-compositionSkill

This skill should be used when the user asks to "compose DSPy modules", "use Ensemble optimizer", "combine multiple programs", "use dspy.MultiChainComparison", mentions "ensemble voting", "module composition", "sequential pipelines", or needs to build complex multi-module DSPy programs with ensemble patterns or multi-chain comparison.

dspy-better-togetherSkill

This skill should be used when the user asks to "use BetterTogether", "combine prompt optimization and fine-tuning", "sequence DSPy optimizers", "run prompt then weight optimization", mentions `dspy.BetterTogether`, strategy strings such as "p -> w -> p", or needs to compose multiple DSPy teleprompters into an evaluated optimization sequence.

dspy-bootstrap-fewshotSkill

This skill should be used when the user asks to "bootstrap few-shot examples", "generate demonstrations", "use BootstrapFewShot", "optimize with limited data", "create training demos automatically", mentions "teacher model for few-shot", "10-50 training examples", or wants automatic demonstration generation for a DSPy program without extensive compute.

dspy-custom-module-designSkill

This skill should be used when the user asks to "create custom DSPy module", "design a DSPy module", "extend dspy.Module", "build reusable DSPy component", mentions "custom module patterns", "module serialization", "stateful modules", "module testing", or needs to design production-quality custom DSPy modules with proper architecture, state management, and testing.

dspy-debugging-observabilitySkill

This skill should be used when the user asks to "debug DSPy programs", "trace LLM calls", "monitor production DSPy", "use MLflow with DSPy", mentions "inspect_history", "custom callbacks", "observability", "production monitoring", "cost tracking", or needs to debug, trace, and monitor DSPy applications in development and production.

dspy-embedding-retrievalSkill

This skill should be used when the user asks to "build local DSPy retrieval", "use dspy.Embedder", "use dspy.Embeddings", "save an embeddings index", "add FAISS retrieval", mentions semantic search, hosted embeddings, local embedding models, `EmbeddingsWithScores`, or needs a DSPy retriever over an application-owned text corpus.