memory-systems
This Claude Code skill provides guidance on designing memory systems for AI agents that maintain knowledge and state across multiple sessions. It covers memory architecture patterns ranging from simple vector stores to temporal knowledge graphs, helps select among frameworks like Mem0, Zep, and Cognee based on retrieval needs, and clarifies boundaries with related skills handling context compression and optimization. Use this skill when building agents requiring persistent entity consistency, multi-hop reasoning over accumulated knowledge, or production-scale memory infrastructure.
git clone --depth 1 https://github.com/guanyang/open-agent-hub /tmp/memory-systems && cp -r /tmp/memory-systems/skills/memory-systems ~/.claude/skills/memory-systemsSKILL.md
# Memory System Design
Memory provides the persistence layer that allows agents to maintain continuity across sessions and reason over accumulated knowledge. Simple agents rely entirely on context for memory, losing all state when sessions end. Sophisticated agents implement layered memory architectures that balance immediate context needs with long-term knowledge retention. The evolution from vector stores to knowledge graphs to temporal knowledge graphs represents increasing investment in structured memory for improved retrieval and reasoning.
## When to Activate
Activate this skill when:
- Building agents that must persist knowledge across sessions
- Choosing between memory frameworks (Mem0, Zep/Graphiti, Letta, LangMem, Cognee)
- Needing to maintain entity consistency across conversations
- Implementing reasoning over accumulated knowledge
- Designing memory architectures that scale in production
- Evaluating memory systems against benchmarks (LoCoMo, LongMemEval, DMR)
- Building dynamic memory with automatic entity/relationship extraction and self-improving memory (Cognee)
Do not activate this skill for adjacent work owned by other skills:
- File-backed scratchpads, run logs, and tool-output offloading: `filesystem-context`.
- Conversation compaction or human-readable handoff summaries: `context-compression`.
- Masking, prefix caching, token budgets, or retrieval scoping inside one trajectory: `context-optimization`.
- Formal belief/desire/intention models over RDF state: `bdi-mental-states`.
## Core Concepts
Think of memory as a spectrum from volatile context window to persistent storage. Default to the simplest layer that meets retrieval needs, because benchmark evidence suggests tool complexity matters less than reliable retrieval for some memory workloads (claim-memory-locomo-filesystem-baseline). Add structure (graphs, temporal validity) only when retrieval quality degrades or the agent needs multi-hop reasoning, relationship traversal, or time-travel queries.
## Detailed Topics
### Production Framework Landscape
Select a framework based on the dominant retrieval pattern the agent requires. Use this table to narrow the shortlist, then validate with the benchmark data below.
| Framework | Architecture | Best For | Trade-off |
|-----------|-------------|----------|-----------|
| **Mem0** | Vector store + graph memory, pluggable backends | Multi-tenant systems, broad integrations | Less specialized for multi-agent |
| **Zep/Graphiti** | Temporal knowledge graph, bi-temporal model | Enterprise requiring relationship modeling + temporal reasoning | Advanced features cloud-locked |
| **Letta** | Self-editing memory with tiered storage (in-context/core/archival) | Full agent introspection, stateful services | Complexity for simple use cases |
| **Cognee** | Multi-layer semantic graph via customizable ECL pipeline with customizable Tasks | Evolving agent memory that adapts and learns; multi-hop reasoning | Heavier ingest-time processing |
| **LangMem** | Memory tools for LangGraph workflows | Teams already on LangGraph | Tightly coupled to LangGraph |
| **File-system** | Plain files with naming conventions | Simple agents, prototyping | No semantic search, no relationships |
Choose Zep/Graphiti when the agent needs bi-temporal modeling (tracking both when events occurred and when they were ingested) because its three-tier knowledge graph (episode, semantic entity, community subgraphs) excels at temporal queries. Choose Mem0 when the priority is fast time-to-production with managed infrastructure. Choose Letta when the agent needs deep self-introspection through its Agent Development Environment. Choose Cognee when the agent must build dense multi-layer semantic graphs — it layers text chunks and entity types as nodes with detailed relationship edges, and every core piece (ingestion, entity extraction, post-processing, retrieval) is customizable.
**Benchmark Performance Comparison**
Consult these benchmarks to set expectations, but treat them as source-specific signals for retrieval dimensions rather than absolute rankings. No single benchmark is definitive.
| System | DMR Accuracy | LoCoMo | HotPotQA (multi-hop) | Latency |
|--------|-------------|--------|---------------------|---------|
| Cognee | — | — | Published high score | Variable |
| Zep (Temporal KG) | Published high score | — | Mid-range across metrics | Low-latency reported |
| Letta (filesystem) | — | Published filesystem baseline | — | — |
| Mem0 | — | Published specialized-tool baseline | Lower in one comparison | — |
| MemGPT | Published high score | — | — | Variable |
| GraphRAG | Published mid/high range | — | — | Variable |
| Vector RAG baseline | Published lower range | — | — | Fast |
Key takeaway: compare memory systems by retrieval shape, not brand. Use benchmark numbers as dated evidence that must be rechecked before making product claims; the stable design rule is to start shallow, measure retrieval quality, then add semantic or graph structure only when a simpler layer fails.
### Memory Layers (Decision Points)
Pick the shallowest memory layer that satisfies the persistence requirement. Each deeper layer adds infrastructure cost and operational complexity, so only escalate when the shallower layer cannot meet the retrieval or durability need.
| Layer | Persistence | Implementation | When to Use |
|-------|------------|----------------|-------------|
| **Working** | Context window only | Scratchpad in system prompt | Always — optimize with attention-favored positions |
| **Short-term** | Session-scoped | File-system, in-memory cache | Intermediate tool results, conversation state |
| **Long-term** | Cross-session | Key-value store → graph DB | User preferences, domain knowledge, entity registries |
| **Entity** | Cross-session | Entity registry + properties | Maintaining identity ("John Doe" = same person across conversations) |
| **Temporal KG** | Cross-session + history | Graph with validityPrincipal Software Architect specializing in system design, database modeling, API engineering, and system resilience.
Principal Diagnostics Engineer specializing in root cause analysis, error troubleshooting, and hotfixes.
Principal Clean Code Specialist specializing in code simplification, performance tuning, and refactoring loops.
Senior Technical Lead and Security Auditor specializing in code quality, correctness, and security audits.
Senior QA Automation Engineer specializing in unit, integration, and E2E test suite creation.
Run when user calls /commit or asks to generate a commit message. Analyzes staged changes and writes a structured commit message.
Run when user calls /review. Analyzes local changes and runs a comprehensive code review using the agent-reviewer prompt.
Run when user calls /test-tdd. Scans modified files, locates their corresponding unit/integration test suites, and runs them.