Skill98 repo starsupdated yesterday

neo4j-vector-index-skill

The neo4j-vector-index-skill enables creation and management of vector indexes on Neo4j nodes and relationships, plus vector similarity search using ANN/kNN queries. Use it when storing embeddings on graph nodes, running nearest-neighbor searches, configuring HNSW parameters or similarity functions, and combining vector results with graph neighborhood traversal. For full GraphRAG retrieval pipelines, use neo4j-graphrag-skill instead; for fulltext-only search or GDS embedding computation, defer to neo4j-cypher-skill or neo4j-gds-skill respectively.

View source Repository: neo4j-skills

Install in Claude Code

Copy

git clone --depth 1 https://github.com/neo4j-contrib/neo4j-skills /tmp/neo4j-vector-index-skill && cp -r /tmp/neo4j-vector-index-skill/neo4j-vector-index-skill ~/.claude/skills/neo4j-vector-index-skill

Then start a new Claude Code session; the skill loads automatically.

Definition

SKILL.md

## When to Use
- Creating a vector index (`CREATE VECTOR INDEX`) on nodes or relationships
- Running vector similarity / nearest-neighbor search
- Storing embeddings on graph nodes during ingestion
- Indexing/querying embeddings already written by GDS algorithms
- Choosing similarity function, dimensions, HNSW params, or quantization
- Using `SEARCH` clause (2026.01+) or `db.index.vector.queryNodes()` (2025.x)
- Batch-updating embeddings after model change
- Combining vector results with immediate graph neighborhood (full retrieval_query pipelines → `neo4j-graphrag-skill`)
- Hybrid search that combines vector results with fulltext or other ranked sources

## When NOT to Use
- **GraphRAG pipelines** (VectorCypherRetriever, HybridCypherRetriever, retrieval_query) → `neo4j-graphrag-skill`
- **Fulltext-only / keyword-only search** (FULLTEXT INDEX, `db.index.fulltext.queryNodes`) → `neo4j-cypher-skill`
- **Computing GDS graph embeddings** (FastRP, Node2Vec, GraphSAGE) → `neo4j-gds-skill`
- **Index admin** (list all indexes, drop range/text/lookup indexes) → `neo4j-cypher-skill`

---

## Pre-flight — Determine Version

Drives syntax choice:
```cypher
CALL dbms.components() YIELD versions RETURN versions[0] AS neo4j_version
```

| Version | Use |
|---|---|
| `2026.01` or higher | `SEARCH` clause (in-index filtering, preferred) |
| `2025.x` | `db.index.vector.queryNodes()` procedure (**deprecated 2026.04** — use `SEARCH` when on 2026.x) |

---

## Step 1 — Create Vector Index

Node index (single label):
```cypher
CYPHER 25
CREATE VECTOR INDEX chunk_embedding IF NOT EXISTS
FOR (c:Chunk) ON (c.embedding)
OPTIONS {
  indexConfig: {
    `vector.dimensions`: 1536,
    `vector.similarity_function`: 'cosine',
    `vector.quantization.enabled`: true,
    `vector.hnsw.m`: 16,
    `vector.hnsw.ef_construction`: 100
  }
}
```

Node index **with filterable properties** [2026.01+] — `WITH` declares which properties can be used in `SEARCH ... WHERE`:
```cypher
CYPHER 25
CREATE VECTOR INDEX chunk_embedding IF NOT EXISTS
FOR (c:Chunk) ON (c.embedding)
WITH [c.source, c.lang, c.published_year]  // stored as metadata; filterable in SEARCH WHERE
OPTIONS { indexConfig: { `vector.dimensions`: 1536, `vector.similarity_function`: 'cosine' } }
```

Multi-label index with filterable properties [2026.01+]:
```cypher
CYPHER 25
CREATE VECTOR INDEX doc_embedding IF NOT EXISTS
FOR (n:Document|Article) ON n.embedding
WITH [n.author, n.published_year, n.lang]
OPTIONS { indexConfig: { `vector.dimensions`: 1536, `vector.similarity_function`: 'cosine' } }
```

Relationship index:
```cypher
CYPHER 25
CREATE VECTOR INDEX rel_embedding IF NOT EXISTS
FOR ()-[r:HAS_CHUNK]-() ON (r.embedding)
OPTIONS { indexConfig: { `vector.dimensions`: 768, `vector.similarity_function`: 'cosine' } }
```

**`WITH` property types** — only scalar types allowed: `INTEGER`, `FLOAT`, `STRING`, `BOOLEAN`, `DATE`, `ZONED DATETIME`, `LOCAL DATETIME`, `ZONED TIME`, `LOCAL TIME`, `DURATION`. Not allowed: `LIST`, `POINT`, or the vector property itself.

**Index config reference:**

| Parameter | Type | Default | Notes |
|---|---|---|---|
| `vector.dimensions` | INTEGER 1–4096 | none | Required; must match embedding model exactly |
| `vector.similarity_function` | STRING | `'cosine'` | `'cosine'` or `'euclidean'` |
| `vector.quantization.enabled` | BOOLEAN | `true` | Reduces storage; slight accuracy tradeoff; needs vector-2.0+ (5.18+) |
| `vector.hnsw.m` | INTEGER 1–512 | `16` | HNSW graph connections; higher = better recall, more memory |
| `vector.hnsw.ef_construction` | INTEGER 1–3200 | `100` | Build-time candidates; higher = better recall, slower build |

**Similarity function choice:**

| Use case | Function |
|---|---|
| Normalized embeddings (OpenAI, Cohere, Voyage, Google) | `'cosine'` |
| Unnormalized / raw distance matters | `'euclidean'` |

---

## Step 2 — Wait for Index ONLINE

Index builds asynchronously — do NOT query until ONLINE:
```cypher
SHOW VECTOR INDEXES YIELD name, state, populationPercent
WHERE name = 'chunk_embedding'
RETURN name, state, populationPercent
```

Poll every 5s until `state = 'ONLINE'` and `populationPercent = 100.0`. If `state = 'FAILED'` → stop, check logs.

Shell poll (cypher-shell):
```bash
until cypher-shell -u neo4j -p "$NEO4J_PASSWORD" \
  "SHOW VECTOR INDEXES YIELD name, state WHERE name='chunk_embedding' RETURN state" \
  | grep -q ONLINE; do
  sleep 5
done
```

---

## Step 3 — Ingest Embeddings

Batch UNWIND pattern (use for > 100 nodes — never one-node-per-transaction):
```python
from neo4j import GraphDatabase

driver = GraphDatabase.driver(uri, auth=(user, password))

def embed_batch(texts: list[str]) -> list[list[float]]:
    response = openai_client.embeddings.create(
        model="text-embedding-3-small", input=texts
    )
    return [r.embedding for r in response.data]

def store_embeddings(records: list[dict], batch_size: int = 500):
    expected_dim = 1536  # must match vector.dimensions
    texts = [r["text"] for r in records]
    embeddings = embed_batch(texts)
    for emb in embeddings:
        assert len(emb) == expected_dim, f"Dim mismatch: {len(emb)} != {expected_dim}"
    rows = [{"id": r["id"], "embedding": emb}
            for r, emb in zip(records, embeddings)]
    for i in range(0, len(rows), batch_size):
        driver.execute_query(
            "UNWIND $rows AS row MATCH (c:Chunk {id: row.id}) SET c.embedding = row.embedding",
            rows=rows[i:i+batch_size]
        )
```

❌ Never create index after embeddings are already stored — always create index first.
✅ Create index → poll ONLINE → ingest embeddings.

---

## Step 4 — Run Vector Search

### SEARCH clause (2026.01+, preferred)

```cypher
CYPHER 25
MATCH (c:Chunk)
  SEARCH c IN (
    VECTOR INDEX chunk_embedding
    FOR $queryEmbedding
    LIMIT 10
  ) SCORE AS score
RETURN c.text, score
ORDER BY score DESC
```

With in-index filter [2026.01+] — properties must be declared in `WITH` at index creation:
```cypher
// Index must have b

More from this repository

neo4j-agent-memory-skillSkill

Authoritative reference for the neo4j-agent-memory Python package — a graph-native memory system for AI agents built on Neo4j — and for the hosted service (NAMS) at memory.neo4jlabs.com. Use this skill whenever the user mentions neo4j-agent-memory, agent memory with Neo4j, context graphs, the POLE+O model, MemoryClient/MemorySettings, the memory MCP server, or any of the framework integrations (LangChain, PydanticAI, CrewAI, AWS Strands, Google ADK, Microsoft Agent Framework, OpenAI Agents, LlamaIndex). Also use when the user mentions the hosted service at memory.neo4jlabs.com, NAMS, the Neo4j Agent Memory Service, the `nams_` API key prefix, or the hosted MCP endpoint. Also use when writing documentation, blog posts, tutorials, PRDs, or code samples for the project, when comparing agent memory approaches, or when positioning graph-native memory against vector-only approaches — even if the user doesn't explicitly name the package.

neo4j-aura-agent-skillSkill

Manages Neo4j Aura Agents via the v2beta1 REST API — create, list, get, update, delete,

neo4j-aura-graph-analytics-skillSkill

Serverless Aura Graph Analytics (AGA) GDS Sessions — covers GdsSessions,

neo4j-aura-provisioning-skillSkill

Provisions and manages Neo4j Aura instances via CLI (aura-cli v1.7+) or REST API.

neo4j-cli-tools-skillSkill

Use when working with Neo4j command-line tools — neo4j-cli (modern unified

neo4j-cypher-skillSkill

Generates, optimizes, and validates Cypher 25 queries for Neo4j 2025.x and 2026.x.

neo4j-document-import-skillSkill

Ingests unstructured and semi-structured documents into Neo4j as a knowledge graph.

neo4j-driver-dotnet-skillSkill

Neo4j .NET Driver v6 — IDriver lifecycle, DI registration (singleton), ExecutableQuery