Skill118 estrellas del repoactualizado 1mo ago

dspy-embedding-retrieval

This skill builds semantic retrieval systems over custom text corpora using DSPy's Embedder and Embeddings classes, supporting both hosted embeddings like OpenAI's and local models via sentence-transformers. Use it when implementing retrieval-augmented generation pipelines, performing semantic search across application-owned documents, or integrating FAISS indexing for efficient retrieval at scale.

Ver fuente Repositorio: dspy-skills

Instalar en Claude Code

Copiar

git clone --depth 1 https://github.com/OmidZamani/dspy-skills /tmp/dspy-embedding-retrieval && cp -r /tmp/dspy-embedding-retrieval/skills/dspy-embedding-retrieval ~/.claude/skills/dspy-embedding-retrieval

Después abre una sesión nueva de Claude Code; el skill carga automáticamente.

Definición

SKILL.md

# DSPy Embedding Retrieval

## Goal

Build semantic retrieval over an application-owned text corpus with `dspy.Embedder` and `dspy.Embeddings`.

## Basic Hosted Embedder

```python
import dspy

corpus = [
    "DSPy programs are composed from modules.",
    "MIPROv2 optimizes instructions and demonstrations.",
    "RLM explores large contexts with a sandboxed REPL.",
]

embedder = dspy.Embedder("openai/text-embedding-3-small")
search = dspy.Embeddings(corpus=corpus, embedder=embedder, k=2)

result = search("Which optimizer tunes prompts?")
print(result.passages)
print(result.indices)
```

## Use in RAG

```python
class LocalRAG(dspy.Module):
    def __init__(self, retriever):
        super().__init__()
        self.retriever = retriever
        self.answer = dspy.ChainOfThought("context: list[str], question -> answer")

    def forward(self, question: str):
        context = self.retriever(question).passages
        return self.answer(context=context, question=question)
```

## Custom Local Embeddings

Wrap any callable that accepts `list[str]` and returns a 2D numeric array:

```python
from sentence_transformers import SentenceTransformer
import dspy

model = SentenceTransformer("sentence-transformers/static-retrieval-mrl-en-v1")
embedder = dspy.Embedder(model.encode)
search = dspy.Embeddings(corpus=corpus, embedder=embedder, k=5)
```

## Scores, FAISS, and Persistence

Use `dspy.EmbeddingsWithScores` when downstream logic needs similarity thresholds or reranking.

For corpora at or above the `brute_force_threshold` default of `20_000`, DSPy builds a FAISS index. Install FAISS first:

```bash
pip install faiss-cpu
```

Persist the index when embedding the corpus is expensive:

```python
search.save("./retrieval-index")
loaded = dspy.Embeddings.from_saved("./retrieval-index", embedder=embedder)
```

## Related Skills

- Build a complete pipeline: [dspy-rag-pipeline](../dspy-rag-pipeline/SKILL.md)
- Design typed context fields: [dspy-signature-designer](../dspy-signature-designer/SKILL.md)
- Harden caches: [dspy-production-deployment](../dspy-production-deployment/SKILL.md)

## Best Practices

1. Evaluate retrieval quality separately from answer quality.
2. Keep corpus chunking deterministic and versioned.
3. Persist expensive indexes.
4. Use `EmbeddingsWithScores` when debugging relevance.
5. Measure memory and latency before enabling FAISS for large corpora.

## Official Documentation

- **Embedder API**: https://dspy.ai/api/models/Embedder/
- **Embeddings API**: https://dspy.ai/api/tools/Embeddings/

Del mismo repositorio

skill-perfectionSkill

Use this skill when you need to QA audit and fix a plugin skill file. Provides a methodology for verifying skill content against official documentation, fixing issues in-place, and producing verification reports.

dspy-adapters-multimodalSkill

Use for DSPy adapter selection, JSONAdapter, XMLAdapter, ChatAdapter, native function calling, structured outputs, and multimodal inputs like dspy.Image or dspy.Audio.

dspy-advanced-module-compositionSkill

Use for composing DSPy modules with Ensemble, MultiChainComparison, ensemble voting, sequential pipelines, and multi-program workflows.

dspy-better-togetherSkill

Use for BetterTogether, prompt plus weight optimization, fine-tuning sequences, and strategy chains like p -> w -> p.

dspy-bootstrap-fewshotSkill

Use for BootstrapFewShot, bootstrapped demonstrations, teacher-model demos, and low-data DSPy prompt optimization.

dspy-custom-module-designSkill

Use for creating custom DSPy modules, extending dspy.Module, reusable components, stateful modules, serialization, and module testing.

dspy-debugging-observabilitySkill

Use for debugging DSPy programs, inspect_history, tracing LLM calls, custom callbacks, observability, monitoring, and cost tracking.

dspy-evaluation-suiteSkill

Use for evaluating DSPy programs with Evaluate, answer_exact_match, SemanticF1, custom metrics, baselines, and program comparisons.