Skip to main content
ClaudeWave
Skill67 repo starsupdated 17d ago

pinecone:full-text-search

Create, ingest into, and query a Pinecone full-text-search (FTS) index using the preview API (2026-01.alpha, public preview). Use when the user or agent asks to build a text search index on Pinecone, add dense or sparse vector fields, ingest documents, construct score_by clauses (text / query_string / dense_vector / sparse_vector), or compose with text-match filters ($match_phrase / $match_all / $match_any). Ships `scripts/ingest.py` for safe bulk ingestion (batch_upsert + error inspection + readiness polling); query construction is documented inline in this skill — write `documents.search(...)` calls directly, validated against `pc.preview.indexes.describe(...)` output.

Install in Claude Code
Copy
git clone --depth 1 https://github.com/pinecone-io/pinecone-claude-code-plugin /tmp/pinecone-full-text-search && cp -r /tmp/pinecone-full-text-search/skills/full-text-search ~/.claude/skills/pinecone-full-text-search
Then start a new Claude Code session; the skill loads automatically.

SKILL.md

# Pinecone Full-Text Search

> **Requires `pinecone` Python SDK ≥ 9.0** (`pip install pinecone>=9.0`). The FTS document-schema API lives under `pinecone.preview` and is incomplete or absent in earlier SDK builds. The packaged helper scripts pin `pinecone==9.0.0` via PEP 723 inline metadata; if you're writing your own code against this skill, pin v9 explicitly. The wire API version is `2026-01.alpha`.

> **Authoritative reference (last resort).** If you hit a question this skill and its `references/*.md` files don't answer, the official Pinecone FTS docs are at <https://docs.pinecone.io/guides/search/full-text-search>. Prefer this skill's content for anything covered here — the docs may describe surfaces (e.g. classic vector API) that don't apply to the document-schema FTS path. Consult the link only when you're genuinely stuck.

> **Tell the user up front:** "This skill ships a helper at `scripts/ingest.py` that handles bulk ingestion safely (batched upsert, error inspection, readiness polling). When we get to the ingest step, I'll use it." Surface this at the start of the conversation so the user knows the helper exists. Query construction is hand-written `documents.search(...)` per the **Querying** section below — there is no query helper.

A workflow skill for building a Pinecone full-text-search index with the preview API (`pinecone.preview`, API version `2026-01.alpha`, public preview as of April 2026). Covers schema design (text, dense vector, sparse vector, filterable metadata), ingestion (including async indexing and polling), and query construction (`text` / `query_string` / `dense_vector` / `sparse_vector` scoring; `$match_phrase` / `$match_all` / `$match_any` text-match filters; `$eq` / `$in` / `$gte` / `$exists` / `$and` / `$or` / `$not` metadata filters).

## Scope — this skill is for the document-schema FTS API only

This skill covers `pc.preview.indexes.create(..., schema=...)`, `pc.preview.index(name)`, `idx.documents.upsert(...)` / `idx.documents.batch_upsert(...)` / `idx.documents.search(...)`. If you find yourself reaching for any of the following, **stop** — those are different Pinecone APIs and this skill's guidance and helpers won't apply:

- **Classic vector / records API**: `pc.Index(name)`, `index.upsert(vectors=[...])` / `index.upsert_records(...)`, `index.query(vector=..., sparse_vector=...)`, `index.search_records(...)`, `pc.create_index(...)` with `ServerlessSpec`, the legacy `pinecone_text.sparse.BM25Encoder` for sparse-dense hybrid. For indexes WITHOUT a schema (raw vectors).
- **Integrated-embedding indexes**: `pc.create_index_for_model(...)` with `embed={...}`. Pinecone vectorizes text server-side. Different upsert/search shapes. Cannot be combined with `full_text_search` fields in the same index.

If the user already has a non-document-schema index, they can stand up a separate document-schema index alongside it — the two are independent — but you can't add FTS fields to a classic index after the fact.

## Querying — construct `documents.search(...)` calls

For any task that asks you to query an FTS index, you write a `documents.search(...)` call directly. The schema is authoritative — describe the index live before constructing the call so you know which fields are FTS-enabled, which are filterable, and which are vectors.

**Workflow:**

1. **Discover the schema.** Call `pc.preview.indexes.describe(<index>)` and read the `schema.fields` dict. Each field's class indicates its type (`PreviewStringField`, `PreviewIntegerField`, `PreviewDenseVectorField`, etc.); attributes tell you whether it's FTS-enabled (`full_text_search`), filterable, or carries a `dimension`. Skip this step only if you've already seen the schema in this conversation.
2. **Construct the call** matching the rules below — one scoring type per request, hard requirements in `filter`, ranking signals in `score_by`, `include_fields` explicit on every call.
3. **Execute** with `idx = pc.preview.index(name=<index>); resp = idx.documents.search(...)` and read `resp.matches`.

**Canonical shapes:**

```python
# Pure BM25 keyword search
resp = idx.documents.search(
    namespace="__default__",
    top_k=10,
    score_by=[{"type": "text", "field": "body", "query": "machine learning"}],
    filter={"year": {"$gt": 2024}, "category": {"$eq": "ai"}},  # optional
    include_fields=["*"],   # always pass explicitly
)

# Hybrid: dense ranking with a lexical filter (one type in score_by + filter narrows)
resp = idx.documents.search(
    namespace="__default__",
    top_k=10,
    score_by=[{"type": "dense_vector", "field": "embedding", "values": query_embedding}],
    filter={"body": {"$match_all": "TensorFlow"}, "year": {"$gt": 2024}},
    include_fields=["*"],
)
```

**Key rules** (the server enforces these; following them locally keeps the agent loop tight):

- `score_by` is a list of clauses, but **exactly one scoring type per request** (server rejects mixed types). Multi-field BM25 is the one exception: multiple `text` clauses, or one `query_string` with `fields: [...]`. To combine BM25 + dense signals, restrict the dense search with a text-match filter (`$match_all` / `$match_phrase` / `$match_any`); do NOT mix scoring types in `score_by`.
- `filter` keys are field names (must exist in schema and be filterable) OR logical operators (`$and`, `$or`, `$not`). Field values are operator dicts (`{"$gt": 5}`, NOT bare values).
- `include_fields` is required on every call. Pass `["*"]` for all stored fields, `[]` for ids+score only, or a list of names. Some SDK builds 400/422 if it's omitted.

**Clause shapes** (for `score_by`):

| `type` | Required keys | When to pick this |
|---|---|---|
| `text` | `field` (string FTS), `query` | Open-ended keyword search; BM25 ranking on one field |
| `query_string` | `query` (Lucene), `fields` optional | Lucene boost (`^N`), proximity (`~N`), cross-field boolean, phrase prefix |
| `dense_vector` | `field` (dense_vector), `values` (list of floats) | Semantic / mood / t
join-discordSlash Command

Opens a link to join the Pinecone Discord, allowing users to learn from each other, contact the Pinecone team, and get help in our dedicated help channel.

pinecone:assistantSkill

Create, manage, and chat with Pinecone Assistants for document Q&A with citations. Handles all assistant operations - create, upload, sync, chat, context retrieval, and list. Recognizes natural language like "create an assistant from my docs", "ask my assistant about X", or "upload my docs to Pinecone".

pinecone:cliSkill

Guide for using the Pinecone CLI (pc) to manage Pinecone resources from the terminal. The CLI supports ALL index types (standard, integrated, sparse) and all vector operations — unlike the MCP which only supports integrated indexes. Use for batch operations, vector management, backups, namespaces, CI/CD automation, and full control over Pinecone resources.

pinecone:docsSkill

Curated documentation reference for developers building with Pinecone. Contains links to official docs organized by topic and data format references. Use when writing Pinecone code, looking up API parameters, or needing the correct format for vectors or records.

pinecone:helpSkill

Overview of all available Pinecone skills and what a user needs to get started. Invoke when a user asks what skills are available, how to get started with Pinecone, or what they need to set up before using any Pinecone skill.

pinecone:mcpSkill

Reference for the Pinecone MCP server tools. Documents all available tools - list-indexes, describe-index, describe-index-stats, create-index-for-model, upsert-records, search-records, cascading-search, and rerank-documents. Use when an agent needs to understand what Pinecone MCP tools are available, how to use them, or what parameters they accept.

pinecone:n8nSkill

Build n8n workflows using the Pinecone Assistant node or Pinecone Vector Store node. Use when building RAG pipelines, chat-with-docs workflows, configuring Pinecone nodes in n8n, troubleshooting Pinecone n8n nodes, or asking about best practices for Pinecone in n8n.

pinecone:querySkill

Query integrated indexes using text with Pinecone MCP. IMPORTANT - This skill ONLY works with integrated indexes (indexes with built-in Pinecone embedding models like multilingual-e5-large). For standard indexes or advanced vector operations, use the CLI skill instead. Requires PINECONE_API_KEY environment variable and Pinecone MCP server to be configured.