neo4j-document-import-skill
Ingests unstructured and semi-structured documents into Neo4j as a knowledge graph.
git clone --depth 1 https://github.com/neo4j-contrib/neo4j-skills /tmp/neo4j-document-import-skill && cp -r /tmp/neo4j-document-import-skill/neo4j-document-import-skill ~/.claude/skills/neo4j-document-import-skillSKILL.md
# Neo4j Document Import Skill
## When to Use
- Ingesting PDFs, HTML, plain text, Markdown into Neo4j as a knowledge graph
- Chunking documents and storing `:Chunk` nodes with embeddings
- Extracting entities and relationships from text with an LLM
- Using `SimpleKGPipeline` (neo4j-graphrag) programmatically
- Using Neo4j LLM Graph Builder (no-code web UI)
- Loading semi-structured JSON via `apoc.load.json`
- Connecting LangChain or LlamaIndex document loaders to Neo4j
## When NOT to Use
- **Structured CSV / relational data** → `neo4j-import-skill`
- **GraphRAG retrieval after ingestion** → `neo4j-graphrag-skill`
- **Vector index creation** → `neo4j-vector-search-skill`
- **Cypher query writing** → `neo4j-cypher-skill`
---
## Approach Decision Table
| Situation | Approach |
|---|---|
| No code; drag-and-drop UX wanted | LLM Graph Builder web UI |
| Programmatic pipeline; PDFs/text | `SimpleKGPipeline` (neo4j-graphrag) |
| JSON / REST API responses | `apoc.load.json` or Python + UNWIND |
| LangChain already in stack | `Neo4jGraph` + document loader |
| LlamaIndex already in stack | `Neo4jQueryEngine` / `Neo4jVectorStore` |
| Chunk-only (no entity extraction) | Manual chunking + MERGE pattern |
---
## Install
```bash
pip install neo4j-graphrag # includes SimpleKGPipeline
pip install neo4j-graphrag[openai] # + OpenAI LLM/embedder
pip install neo4j-graphrag[anthropic] # + Anthropic Claude
pip install neo4j-graphrag[google] # + Vertex AI / Gemini
pip install neo4j-graphrag[bedrock] # + Amazon Bedrock (boto3) — added v1.15.0
pip install neo4j-graphrag[ollama] # + Ollama (local)
pip install neo4j-graphrag[mistralai] # + MistralAI
pip install neo4j-graphrag[fuzzy-matching] # + FuzzyMatchResolver (rapidfuzz)
# spaCy entity resolver (Python <= 3.13 only — unsupported on 3.14+):
pip install neo4j-graphrag[nlp]
```
Requires: `neo4j>=5.17.0` (driver 6.x supported), Python>=3.10, Neo4j>=5.18.1 (Aura>=5.18.0).
---
## Step 1 — Define Graph Schema
Schema controls what the LLM extracts. Define before pipeline construction.
```python
# Option A — Simple string lists (LLM infers descriptions)
entities = ["Person", "Organization", "Location", "Product", "Event"]
relations = ["WORKS_AT", "LOCATED_IN", "KNOWS", "MENTIONS", "PART_OF"]
patterns = [
("Person", "WORKS_AT", "Organization"),
("Organization", "LOCATED_IN", "Location"),
("Person", "KNOWS", "Person"),
("Article", "MENTIONS", "Organization"),
]
# Option B — Rich GraphSchema (production; best extraction quality)
from neo4j_graphrag.experimental.components.schema import (
GraphSchema, NodeType, RelationshipType, PropertyType, ConstraintType
)
schema = GraphSchema(
node_types=[
NodeType(
label="Person",
description="A human individual",
properties=[
PropertyType(name="name", type="STRING"),
PropertyType(name="role", type="STRING"),
],
),
NodeType(
label="Organization",
description="A company or institution",
properties=[
PropertyType(name="name", type="STRING"),
PropertyType(name="industry", type="STRING"),
],
),
],
relationship_types=[
RelationshipType(label="WORKS_AT", description="Employment relationship"),
],
patterns=[("Person", "WORKS_AT", "Organization")],
# Optional: constraints emitted to ParquetWriter metadata (v1.15.0+)
constraints=[
ConstraintType(label="Person", property_name="name", type="UNIQUENESS"),
ConstraintType(label="Organization", property_name="name", type="KEY"),
],
)
# Option C — Auto-extract schema from text (no constraints)
schema = "EXTRACTED" # LLM infers types; noisier output
schema = "FREE" # No schema guidance; most noise
```
Use Option B for production; Option A for prototyping; `"EXTRACTED"` only for exploration.
---
## Step 2 — SimpleKGPipeline Setup
```python
import asyncio
from neo4j import GraphDatabase
from neo4j_graphrag.experimental.pipeline.kg_builder import SimpleKGPipeline
from neo4j_graphrag.llm import OpenAILLM
from neo4j_graphrag.embeddings import OpenAIEmbeddings
driver = GraphDatabase.driver(
"neo4j+s://xxxx.databases.neo4j.io",
auth=("neo4j", "password")
)
llm = OpenAILLM(
model_name="gpt-4.1",
model_params={"temperature": 0},
# Note: SimpleKGPipeline auto-enables structured output for OpenAI/VertexAI LLMs (v1.14.0+)
# Do NOT set response_format manually — it is managed by the pipeline
)
embedder = OpenAIEmbeddings() # OPENAI_API_KEY from env
pipeline = SimpleKGPipeline(
llm=llm,
driver=driver,
embedder=embedder,
schema=schema, # GraphSchema, dict, "FREE", or "EXTRACTED"
from_file=True, # False → pass text= instead of file_path=
on_error="IGNORE", # RAISE to surface extraction failures
perform_entity_resolution=True,
neo4j_database="neo4j", # omit to use default
)
```
**LLM alternatives** (same interface):
- `AnthropicLLM(model_name="claude-3-5-sonnet-20241022")`
- `VertexAILLM(model_name="gemini-2.0-flash")`
- `OllamaLLM(model_name="llama3")` — local; no API key needed
- `BedrockLLM(model_id="anthropic.claude-3-5-sonnet-20241022-v2:0")` — Amazon Bedrock (v1.15.0+)
---
## Step 3 — Run the Pipeline
```python
# From PDF file:
result = asyncio.run(pipeline.run_async(
file_path="report.pdf", # auto-dispatches to PdfLoader
document_metadata={"source": "Q4 report", "year": 2025},
))
# From Markdown file (v1.15.0+):
result = asyncio.run(pipeline.run_async(
file_path="notes.md", # auto-dispatches to MarkdownLoader
document_metadata={"source": "meeting notes"},
))
# Note: old `from_pdf=True` parameter is DEPRECATED since v1.15.0; use `from_file=True` instead
# pipeline = SimpleKGPipeline(..., from_file=True) ← correAuthoritative reference for the neo4j-agent-memory Python package — a graph-native memory system for AI agents built on Neo4j — and for the hosted service (NAMS) at memory.neo4jlabs.com. Use this skill whenever the user mentions neo4j-agent-memory, agent memory with Neo4j, context graphs, the POLE+O model, MemoryClient/MemorySettings, the memory MCP server, or any of the framework integrations (LangChain, PydanticAI, CrewAI, AWS Strands, Google ADK, Microsoft Agent Framework, OpenAI Agents, LlamaIndex). Also use when the user mentions the hosted service at memory.neo4jlabs.com, NAMS, the Neo4j Agent Memory Service, the `nams_` API key prefix, or the hosted MCP endpoint. Also use when writing documentation, blog posts, tutorials, PRDs, or code samples for the project, when comparing agent memory approaches, or when positioning graph-native memory against vector-only approaches — even if the user doesn't explicitly name the package.
Manages Neo4j Aura Agents via the v2beta1 REST API — create, list, get, update, delete,
Serverless Aura Graph Analytics (AGA) GDS Sessions — covers GdsSessions,
Provisions and manages Neo4j Aura instances via CLI (aura-cli v1.7+) or REST API.
Use when working with Neo4j command-line tools — neo4j-cli (modern unified
Generates, optimizes, and validates Cypher 25 queries for Neo4j 2025.x and 2026.x.
Neo4j .NET Driver v6 — IDriver lifecycle, DI registration (singleton), ExecutableQuery
Covers the Neo4j Go Driver v6 — driver lifecycle, ExecuteQuery, managed and