dspy-optimize-anything
Universal text artifact optimizer using GEPA's optimize_anything API for code, prompts, agent architectures, configs, and more
git clone --depth 1 https://github.com/OmidZamani/dspy-skills /tmp/dspy-optimize-anything && cp -r /tmp/dspy-optimize-anything/skills/dspy-optimize-anything ~/.claude/skills/dspy-optimize-anythingSKILL.md
# GEPA optimize_anything
## Goal
Optimize any artifact representable as text — code, prompts, agent architectures, vector graphics, configurations — using a single declarative API powered by GEPA's reflective evolutionary search.
## When to Use
- **Beyond prompt optimization** — optimizing code, configs, SVGs, scheduling policies, etc.
- **Single hard problems** — circle packing, kernel generation, algorithm discovery
- **Batch related problems** — CUDA kernels, code generation tasks with cross-transfer
- **Generalization** — agent skills, policies, or prompts that must transfer to unseen inputs
- When you can **express quality as a score** and provide **diagnostic feedback** (ASI)
## Inputs
| Input | Type | Description |
|-------|------|-------------|
| `seed_candidate` | `str \| dict[str, str] \| None` | Starting artifact text, or `None` for seedless mode |
| `evaluator` | `Callable` | Returns score (higher=better), optionally with ASI dict |
| `dataset` | `list \| None` | Training examples (for multi-task and generalization modes) |
| `valset` | `list \| None` | Validation set (for generalization mode) |
| `objective` | `str \| None` | Natural language description of what to optimize for |
| `background` | `str \| None` | Domain knowledge and constraints |
| `config` | `GEPAConfig \| None` | Engine, reflection, and tracking settings |
## Outputs
| Output | Type | Description |
|--------|------|-------------|
| `result.best_candidate` | `str \| dict` | Best optimized artifact |
## Workflow
### Phase 1: Install
```bash
pip install -U "gepa>=0.1.1,<0.2"
```
### Phase 2: Define Evaluator with ASI
The evaluator scores a candidate and returns Actionable Side Information (ASI) — diagnostic feedback that guides the LLM proposer during reflection.
**Simple evaluator (score only):**
```python
import gepa.optimize_anything as oa
from gepa.optimize_anything import EngineConfig, GEPAConfig
config = GEPAConfig(engine=EngineConfig(max_metric_calls=100))
def evaluate(candidate: str) -> float:
score, diagnostic = run_my_system(candidate)
oa.log(f"Error: {diagnostic}") # captured as ASI
return score
```
**Rich evaluator (score + structured ASI):**
```python
def evaluate(candidate: str) -> tuple[float, dict]:
result = execute_code(candidate)
return result.score, {
"Error": result.stderr,
"Output": result.stdout,
"Runtime": f"{result.time_ms:.1f}ms",
}
```
ASI can include open-ended text, structured data, multi-objectives (via `scores`), or images (via `gepa.Image`) for vision-capable LLMs.
### Phase 3: Choose Optimization Mode
**Mode 1 — Single-Task Search:** Solve one hard problem. No dataset needed.
```python
result = oa.optimize_anything(
seed_candidate="<your initial artifact>",
evaluator=evaluate,
config=config,
)
```
**Mode 2 — Multi-Task Search:** Solve a batch of related problems with cross-transfer.
```python
result = oa.optimize_anything(
seed_candidate="<your initial artifact>",
evaluator=evaluate,
dataset=tasks,
config=config,
)
```
**Mode 3 — Generalization:** Build a skill/prompt/policy that transfers to unseen problems.
```python
result = oa.optimize_anything(
seed_candidate="<your initial artifact>",
evaluator=evaluate,
dataset=train,
valset=val,
config=config,
)
```
**Seedless mode:** Describe what you need instead of providing a seed.
```python
result = oa.optimize_anything(
evaluator=evaluate,
objective="Generate a Python function `reverse()` that reverses a string.",
config=config,
)
```
### Phase 4: Use Results
```python
print(result.best_candidate)
```
## Production Example
```python
import gepa.optimize_anything as oa
from gepa import Image
from gepa.optimize_anything import EngineConfig, GEPAConfig
import logging
logger = logging.getLogger(__name__)
# ---------- SVG optimization with VLM feedback ----------
GOAL = "a pelican riding a bicycle"
VLM = "vertex_ai/gemini-3-flash-preview"
VISUAL_ASPECTS = [
{"id": "overall", "criteria": f"Rate overall quality of this SVG ({GOAL}). SCORE: X/10"},
{"id": "anatomy", "criteria": "Rate pelican accuracy: beak, pouch, plumage. SCORE: X/10"},
{"id": "bicycle", "criteria": "Rate bicycle: wheels, frame, handlebars, pedals. SCORE: X/10"},
{"id": "composition", "criteria": "Rate how convincingly the pelican rides the bicycle. SCORE: X/10"},
]
def evaluate(candidate, example):
"""Render SVG, score with a VLM, return (score, ASI)."""
image = render_image(candidate["svg_code"]) # via cairosvg
score, feedback = get_vlm_score_feedback(VLM, image, example["criteria"])
return score, {
"RenderedSVG": Image(base64_data=image, media_type="image/png"),
"Feedback": feedback,
}
result = oa.optimize_anything(
seed_candidate={"svg_code": "<svg>...</svg>"},
evaluator=evaluate,
dataset=VISUAL_ASPECTS,
background=f"Optimize SVG source code depicting '{GOAL}'. "
"Improve anatomy, composition, and visual quality.",
config=GEPAConfig(engine=EngineConfig(max_metric_calls=100)),
)
logger.info(f"Best SVG:\n{result.best_candidate['svg_code']}")
# ---------- Code optimization (single-task) ----------
def evaluate_solver(candidate: str) -> tuple[float, dict]:
"""Evaluate a Python solver for a mathematical optimization problem."""
import subprocess, json
proc = subprocess.run(
["python", "-c", candidate],
capture_output=True, text=True, timeout=30,
)
if proc.returncode != 0:
oa.log(f"Runtime error: {proc.stderr}")
return 0.0, {"Error": proc.stderr}
try:
output = json.loads(proc.stdout)
return output["score"], {
"Output": output.get("solution"),
"Runtime": f"{output.get('time_ms', 0):.1f}ms",
}
except (json.JSONDecodeError, KeyError) as e:
oa.log(f"Parse error: {e}")
return 0.0, {"Error": str(e), "StdoUse this skill when you need to QA audit and fix a plugin skill file. Provides a methodology for verifying skill content against official documentation, fixing issues in-place, and producing verification reports.
This skill should be used when the user asks to "choose a DSPy adapter", "use JSONAdapter", "use XMLAdapter", "enable native function calling", "send images, audio, or files to DSPy", mentions `dspy.ChatAdapter`, `dspy.JSONAdapter`, `dspy.XMLAdapter`, `dspy.Image`, `dspy.Audio`, `dspy.File`, structured outputs, or multimodal DSPy signatures.
This skill should be used when the user asks to "compose DSPy modules", "use Ensemble optimizer", "combine multiple programs", "use dspy.MultiChainComparison", mentions "ensemble voting", "module composition", "sequential pipelines", or needs to build complex multi-module DSPy programs with ensemble patterns or multi-chain comparison.
This skill should be used when the user asks to "use BetterTogether", "combine prompt optimization and fine-tuning", "sequence DSPy optimizers", "run prompt then weight optimization", mentions `dspy.BetterTogether`, strategy strings such as "p -> w -> p", or needs to compose multiple DSPy teleprompters into an evaluated optimization sequence.
This skill should be used when the user asks to "bootstrap few-shot examples", "generate demonstrations", "use BootstrapFewShot", "optimize with limited data", "create training demos automatically", mentions "teacher model for few-shot", "10-50 training examples", or wants automatic demonstration generation for a DSPy program without extensive compute.
This skill should be used when the user asks to "create custom DSPy module", "design a DSPy module", "extend dspy.Module", "build reusable DSPy component", mentions "custom module patterns", "module serialization", "stateful modules", "module testing", or needs to design production-quality custom DSPy modules with proper architecture, state management, and testing.
This skill should be used when the user asks to "debug DSPy programs", "trace LLM calls", "monitor production DSPy", "use MLflow with DSPy", mentions "inspect_history", "custom callbacks", "observability", "production monitoring", "cost tracking", or needs to debug, trace, and monitor DSPy applications in development and production.
This skill should be used when the user asks to "build local DSPy retrieval", "use dspy.Embedder", "use dspy.Embeddings", "save an embeddings index", "add FAISS retrieval", mentions semantic search, hosted embeddings, local embedding models, `EmbeddingsWithScores`, or needs a DSPy retriever over an application-owned text corpus.