Skip to main content
ClaudeWave
Skill82 repo starsupdated 3d ago

neo4j-import-skill

Import structured data into Neo4j — LOAD CSV, CALL IN TRANSACTIONS, neo4j-admin

Install in Claude Code
Copy
git clone --depth 1 https://github.com/neo4j-contrib/neo4j-skills /tmp/neo4j-import-skill && cp -r /tmp/neo4j-import-skill/neo4j-import-skill ~/.claude/skills/neo4j-import-skill
Then start a new Claude Code session; the skill loads automatically.

SKILL.md

# Neo4j Import Skill

## When to Use

- Importing CSV, JSON, or Parquet files into Neo4j
- Batch-upserting nodes and relationships (UNWIND + CALL IN TRANSACTIONS)
- Migrating relational data (SQL → graph)
- Bulk-loading large datasets offline (neo4j-admin import)
- Choosing between online (Cypher) and offline (admin) import methods
- Verifying import completeness (counts, constraints, index states)

## When NOT to Use

- **Unstructured docs, PDFs, vector chunks** → `neo4j-document-import-skill`
- **Live application writes (MERGE/CREATE in app code)** → `neo4j-cypher-skill`
- **neo4j-admin backup/restore/config** → `neo4j-cli-tools-skill`
- **GDS algorithm projection from existing graph** → `neo4j-gds-skill`

---

## Method Decision Table

| Dataset size | DB state | Source | Method |
|---|---|---|---|
| Any size | Online | CSV (Aura or local) | LOAD CSV + CALL IN TRANSACTIONS |
| < 1M rows | Online | List/API response | UNWIND + CALL IN TRANSACTIONS |
| > 10M rows | **Offline** (local/self-managed) | CSV / Parquet | `neo4j-admin database import full` |
| Any size | Online | APOC available | `apoc.periodic.iterate` + `apoc.load.csv` |
| Any size | Online | JSON/API | `apoc.load.json` or driver batching |
| Incremental delta | Offline (Enterprise) | CSV | `neo4j-admin database import incremental` |

**Aura**: only `https://` URLs — no `file:///`. Use neo4j-admin import only on self-managed.

---

## Pre-Import Checklist

Run in this exact order — skipping causes hard-to-debug duplicates or missed index usage:

**Constraints BEFORE import. Additional indexes AFTER import.**
- Constraints create implicit RANGE indexes used by MERGE during load + enforce uniqueness
- Additional non-unique indexes (TEXT, RANGE on non-key props, FULLTEXT) created after load — Neo4j populates them async from the committed data; poll `populationPercent` until 100%
- Creating extra indexes before import slows every write during load with no benefit

1. **Create uniqueness constraints** (enables index used by MERGE):
   ```cypher
   CREATE CONSTRAINT IF NOT EXISTS FOR (n:Person) REQUIRE n.id IS UNIQUE;
   CREATE CONSTRAINT IF NOT EXISTS FOR (n:Movie)  REQUIRE n.movieId IS UNIQUE;
   ```
   > **Neo4j 2026.02+ (Enterprise/Aura) — PREVIEW:** `ALTER CURRENT GRAPH TYPE SET { … }` can replace all individual constraint statements with a single declarative block. See `neo4j-cypher-skill/references/graph-type.md`. Use individual `CREATE CONSTRAINT` on older versions or Community Edition.

2. **Verify APOC if using apoc.* procedures**:
   ```cypher
   RETURN apoc.version();
   ```
   If fails → APOC not installed. Use plain LOAD CSV instead.

3. **Confirm target is PRIMARY** (not replica):
   ```cypher
   CALL dbms.cluster.role() YIELD role RETURN role;
   ```
   If role ≠ `PRIMARY` → stop. Redirect write to PRIMARY endpoint.

4. **Count source file rows** before import (catch encoding issues early):
   ```bash
   wc -l data/persons.csv    # Linux/macOS
   ```

5. **Verify UTF-8 encoding** — LOAD CSV requires UTF-8. Re-encode if needed:
   ```bash
   file -i persons.csv       # Check encoding
   iconv -f latin1 -t utf-8 persons.csv > persons_utf8.csv
   ```

---

## LOAD CSV Patterns

### Basic node import with type coercion and null handling

```cypher
CYPHER 25
LOAD CSV WITH HEADERS FROM 'file:///persons.csv' AS row
CALL (row) {
  MERGE (p:Person {id: row.id})
  ON CREATE SET
    p.name       = row.name,
    p.age        = toIntegerOrNull(row.age),
    p.score      = toFloatOrNull(row.score),
    p.active     = toBoolean(row.active),
    p.born       = CASE WHEN row.born IS NOT NULL AND row.born <> '' THEN date(row.born) ELSE null END,
    p.createdAt  = datetime()
  ON MATCH SET
    p.updatedAt  = datetime()
} IN TRANSACTIONS OF 10000 ROWS
  ON ERROR CONTINUE
  REPORT STATUS AS s
RETURN s.transactionId, s.committed, s.errorMessage
```

Null/empty-string rules:
- CSV missing column → `null` (safe)
- CSV empty string `""` → stored as `""` **not** `null` — use `nullIf(row.x, '')` to convert
- `toInteger(null)` throws → always use `toIntegerOrNull()`
- `toFloat(null)` throws → always use `toFloatOrNull()`
- Neo4j never stores `null` properties — they are silently dropped on SET

### Relationship import (nodes must exist first)

```cypher
CYPHER 25
LOAD CSV WITH HEADERS FROM 'file:///knows.csv' AS row
CALL (row) {
  MATCH (a:Person {id: row.fromId})
  MATCH (b:Person {id: row.toId})
  MERGE (a)-[:KNOWS {since: toIntegerOrNull(row.year)}]->(b)
} IN TRANSACTIONS OF 5000 ROWS
  ON ERROR CONTINUE
  REPORT STATUS AS s
```

Always import ALL nodes before ANY relationships — MATCH fails on missing nodes.

### Tab-separated or custom delimiter

```cypher
CYPHER 25
LOAD CSV WITH HEADERS FROM 'file:///data.tsv' AS row FIELDTERMINATOR '\t'
CALL (row) { MERGE (p:Person {id: row.id}) }
IN TRANSACTIONS OF 10000 ROWS ON ERROR CONTINUE
```

### Compressed files (ZIP / gzip — local files only)

```cypher
LOAD CSV WITH HEADERS FROM 'file:///archive.csv.gz' AS row ...
```

### Cloud storage (Enterprise Edition)

| Scheme | Example |
|---|---|
| AWS S3 | `s3://my-bucket/data/persons.csv` |
| Google Cloud Storage | `gs://my-bucket/persons.csv` |
| Azure Blob | `azb://account/container/persons.csv` |

### Useful built-in functions inside LOAD CSV

```cypher
linenumber()   // current line number — use as fallback ID
file()         // absolute path of file being loaded
```

---

## CALL IN TRANSACTIONS — Full Reference

### Syntax

```cypher
CALL (row) {
  // write logic
} IN [n CONCURRENT] TRANSACTIONS
  [OF batchSize ROW[S]]
  [ON ERROR {CONTINUE | BREAK | FAIL | RETRY [FOR duration SECONDS] [THEN {CONTINUE|BREAK|FAIL}]}]
  [REPORT STATUS AS statusVar]
```

### ON ERROR modes

| Mode | Behavior | Use when |
|---|---|---|
| `ON ERROR FAIL` | Default. Rolls back entire outer tx on first error | All-or-nothing strict import |
| `ON ERROR CONTINUE` | Skips failed batch, continues remaining batches | Resilient bulk load — track errors via REPORT STATUS
neo4j-agent-memory-skillSkill

Authoritative reference for the neo4j-agent-memory Python package — a graph-native memory system for AI agents built on Neo4j — and for the hosted service (NAMS) at memory.neo4jlabs.com. Use this skill whenever the user mentions neo4j-agent-memory, agent memory with Neo4j, context graphs, the POLE+O model, MemoryClient/MemorySettings, the memory MCP server, or any of the framework integrations (LangChain, PydanticAI, CrewAI, AWS Strands, Google ADK, Microsoft Agent Framework, OpenAI Agents, LlamaIndex). Also use when the user mentions the hosted service at memory.neo4jlabs.com, NAMS, the Neo4j Agent Memory Service, the `nams_` API key prefix, or the hosted MCP endpoint. Also use when writing documentation, blog posts, tutorials, PRDs, or code samples for the project, when comparing agent memory approaches, or when positioning graph-native memory against vector-only approaches — even if the user doesn't explicitly name the package.

neo4j-aura-agent-skillSkill

Manages Neo4j Aura Agents via the v2beta1 REST API — create, list, get, update, delete,

neo4j-aura-graph-analytics-skillSkill

Serverless Aura Graph Analytics (AGA) GDS Sessions — covers GdsSessions,

neo4j-aura-provisioning-skillSkill

Provisions and manages Neo4j Aura instances via CLI (aura-cli v1.7+) or REST API.

neo4j-cli-tools-skillSkill

Use when working with Neo4j command-line tools — neo4j-cli (modern unified

neo4j-cypher-skillSkill

Generates, optimizes, and validates Cypher 25 queries for Neo4j 2025.x and 2026.x.

neo4j-document-import-skillSkill

Ingests unstructured and semi-structured documents into Neo4j as a knowledge graph.

neo4j-driver-dotnet-skillSkill

Neo4j .NET Driver v6 — IDriver lifecycle, DI registration (singleton), ExecutableQuery