compact-memory-implementation
Developer implementation guide for adding compact memory to an Agent — covers fork agent pattern for compaction, trigger strategy, summary format design, and memory restoration in subsequent sessions. Use when a developer asks how to implement compact memory, context compression, or memory persistence in their agent built with Claude Agent SDK or Anthropic API.
git clone --depth 1 https://github.com/simbajigege/book2skills /tmp/compact-memory-implementation && cp -r /tmp/compact-memory-implementation/skills/compact-memory-implementation ~/.claude/skills/compact-memory-implementationSKILL.md
# compact-memory-implementation
A developer guide for building compact memory into an Agent: detect when to compress, fork a compactor sub-agent, produce a structured summary, and restore it in the next session.
## Step 1 — Understand the setup
Before designing anything, clarify:
- **SDK / language**: Claude Agent SDK? Direct Anthropic API? Python or TypeScript?
- **Agent architecture**: single-agent loop, multi-agent, tool-calling?
- **Session model**: one long-running session or multiple short sessions?
- **What must survive compaction**: task state, decisions, tool results, conversation history?
This determines which pattern fits.
---
## Step 2 — When to trigger compact
Three strategies, pick based on your session model:
**1. Token threshold** (recommended)
Check `usage.input_tokens` from the previous response. When it exceeds ~70–80% of your model's context limit, trigger compact.
```python
COMPACT_THRESHOLD = 150_000 # adjust per model
if response.usage.input_tokens > COMPACT_THRESHOLD:
compact = compact_memory(history)
history = [] # reset — compact moves to system prompt
```
**2. Turn count**
Compact every N turns. Simpler but less adaptive — misses sessions with a few very long turns.
```python
COMPACT_EVERY_N = 30
if turn_count % COMPACT_EVERY_N == 0:
compact = compact_memory(history)
```
**3. Phase boundary**
Compact at natural task boundaries (after research, before implementation). Requires the agent to detect phases. Produces summaries that align with meaningful milestones, but harder to implement reliably.
**Recommended default**: token threshold at 70%, with turn-count fallback at N=40.
---
## Step 3 — Fork agent for compaction
The compactor is a **separate agent call** whose only job is to read the current state and return a structured summary. Fork it synchronously — the main agent waits for the result before continuing.
```python
def compact_memory(history: list[dict]) -> dict:
response = client.messages.create(
model="claude-haiku-4-5-20251001", # cheaper model is fine for compaction
max_tokens=4096,
system=COMPACTOR_SYSTEM_PROMPT,
messages=[
{
"role": "user",
"content": format_history_for_compact(history),
}
],
)
return json.loads(response.content[0].text)
```
**Why fork instead of self-compact:**
- The main agent may have drifted in focus; the compactor starts fresh with the full picture
- Compaction is a different cognitive task — summarizing vs. executing
- A cheaper, smaller model (Haiku) can do compaction; save the expensive model for main work
- Clean separation makes the compact output easier to validate and test
---
## Step 4 — How to compact: format and prompt
### Compact output schema
```json
{
"task": "What the agent is working on and why — the goal, not the steps",
"current_state": "Exact status at compaction point: what is done, what is not, what is in progress",
"key_decisions": [
{ "decision": "...", "reason": "...", "constraint": "..." }
],
"eliminated_approaches": [
{ "approach": "...", "reason_ruled_out": "..." }
],
"open_questions": ["..."],
"next_steps": ["..."],
"relevant_tool_results": {
"key": "Only results future steps will need — summarized, not raw dumps"
},
"compacted_at_turn": 42
}
```
### Compactor system prompt
```
You are a conversation compactor. Read the provided conversation and produce a JSON summary that captures everything a fresh agent needs to continue the work without asking what happened.
Include:
- Current task and goal (not the steps taken to get here)
- Exact current state — what is done and what is not
- Decisions made and WHY (reasoning, not just the choice)
- Approaches tried and ruled out with reasons (prevents re-exploration)
- Open questions and blockers
- Concrete next steps in priority order
- Tool results that future steps will need (summarize, don't dump raw output)
Omit:
- Intermediate reasoning that led nowhere
- Completed sub-tasks with no future relevance
- Raw tool output that has already been acted on
- Anything derivable by reading the code or running a command
Output valid JSON matching the schema provided. No prose outside the JSON.
```
### Format history for compactor
```python
def format_history_for_compact(history: list[dict]) -> str:
lines = ["Conversation to compact:\n"]
for msg in history:
role = msg["role"].upper()
content = msg["content"] if isinstance(msg["content"], str) else "[tool use]"
lines.append(f"[{role}]: {content[:2000]}") # cap very long messages
return "\n".join(lines)
```
---
## Step 5 — How to use after compacting: memory restoration
The compact object becomes the "memory" for the next turn or session. Inject it into the system prompt so it's always visible to the agent.
### Pattern A — System prompt injection (recommended)
```python
MEMORY_BLOCK_TEMPLATE = """
## Restored memory (compacted at turn {turn})
**Task**: {task}
**Current state**: {current_state}
**Key decisions**:
{decisions}
**Ruled out approaches**:
{eliminated}
**Next steps**:
{next_steps}
Begin from current state above. Do not re-explore eliminated approaches.
"""
def build_system_with_memory(base_system: str, compact: dict | None) -> str:
if compact is None:
return base_system
memory = MEMORY_BLOCK_TEMPLATE.format(
turn=compact["compacted_at_turn"],
task=compact["task"],
current_state=compact["current_state"],
decisions="\n".join(f"- {d['decision']} (because {d['reason']})"
for d in compact["key_decisions"]),
eliminated="\n".join(f"- {e['approach']}: {e['reason_ruled_out']}"
for e in compact["eliminated_approaches"]),
next_steps="\n".join(f"- {s}" for s in compact["next_steps"]),
)
return base_system + "\n\n" + memory
```
### Pattern B — First message injRestructures a chaotic or overgrown MEMORY.md into a clean 2-layer architecture based on how Claude Code's autoDream system organizes memory — a lightweight pointer index (always loaded) and topic files (loaded on demand). Stale or superseded memories are deleted or corrected in place — not archived. Use this skill whenever the user says \"clean up MEMORY.md\", \"reorganize my memory files\", \"MEMORY.md is getting too long\", \"fix my memory structure\", or when you observe that MEMORY.md exceeds 200 lines, contains full paragraphs instead of pointers, or mixes index entries with topic content.
>
Use Business Adventures for "why did this fail?", "analyze this crisis", "what pattern applies?", or "what would Brooks notice?
Apply John Bogle stewardship capitalism logic to separate investing from
Apply John Bogle index investing rules for low-cost funds, asset allocation,
Apply Jonah Berger''s STEPPS framework. Trigger on: "why is this not spreading?", "make this campaign contagious", "diagnose viral content".
Apply China contract drafting review with San Guan Si Bu Fa. Trigger on contract review, drafting, clauses, or deal structure.
Apply Brittany Hodak's SUPER Model for customer loyalty, referrals, word of mouth, personalization, service recovery, and scalable customer experience.