Skill116 repo starsupdated 15d ago

compact-memory-implementation

This skill provides a developer guide for implementing memory compaction in agents built with the Claude Agent SDK or Anthropic API. It covers trigger strategies (token threshold, turn count, or phase boundaries), the fork-agent pattern where a dedicated compactor sub-agent summarizes conversation history, structured summary design, and restoration of compressed memory in subsequent sessions. Use when implementing context compression, memory persistence, or addressing token limits in multi-turn agent workflows.

View source Repository: book2skills

Install in Claude Code

Copy

git clone --depth 1 https://github.com/simbajigege/book2skills /tmp/compact-memory-implementation && cp -r /tmp/compact-memory-implementation/skills/compact-memory-implementation ~/.claude/skills/compact-memory-implementation

Then start a new Claude Code session; the skill loads automatically.

Definition

SKILL.md

# compact-memory-implementation

A developer guide for building compact memory into an Agent: detect when to compress, fork a compactor sub-agent, produce a structured summary, and restore it in the next session.

## Step 1 — Understand the setup

Before designing anything, clarify:

- **SDK / language**: Claude Agent SDK? Direct Anthropic API? Python or TypeScript?
- **Agent architecture**: single-agent loop, multi-agent, tool-calling?
- **Session model**: one long-running session or multiple short sessions?
- **What must survive compaction**: task state, decisions, tool results, conversation history?

This determines which pattern fits.

---

## Step 2 — When to trigger compact

Three strategies, pick based on your session model:

**1. Token threshold** (recommended)
Check `usage.input_tokens` from the previous response. When it exceeds ~70–80% of your model's context limit, trigger compact.

```python
COMPACT_THRESHOLD = 150_000  # adjust per model

if response.usage.input_tokens > COMPACT_THRESHOLD:
    compact = compact_memory(history)
    history = []  # reset — compact moves to system prompt
```

**2. Turn count**
Compact every N turns. Simpler but less adaptive — misses sessions with a few very long turns.

```python
COMPACT_EVERY_N = 30

if turn_count % COMPACT_EVERY_N == 0:
    compact = compact_memory(history)
```

**3. Phase boundary**
Compact at natural task boundaries (after research, before implementation). Requires the agent to detect phases. Produces summaries that align with meaningful milestones, but harder to implement reliably.

**Recommended default**: token threshold at 70%, with turn-count fallback at N=40.

---

## Step 3 — Fork agent for compaction

The compactor is a **separate agent call** whose only job is to read the current state and return a structured summary. Fork it synchronously — the main agent waits for the result before continuing.

```python
def compact_memory(history: list[dict]) -> dict:
    response = client.messages.create(
        model="claude-haiku-4-5-20251001",  # cheaper model is fine for compaction
        max_tokens=4096,
        system=COMPACTOR_SYSTEM_PROMPT,
        messages=[
            {
                "role": "user",
                "content": format_history_for_compact(history),
            }
        ],
    )
    return json.loads(response.content[0].text)
```

**Why fork instead of self-compact:**
- The main agent may have drifted in focus; the compactor starts fresh with the full picture
- Compaction is a different cognitive task — summarizing vs. executing
- A cheaper, smaller model (Haiku) can do compaction; save the expensive model for main work
- Clean separation makes the compact output easier to validate and test

---

## Step 4 — How to compact: format and prompt

### Compact output schema

```json
{
  "task": "What the agent is working on and why — the goal, not the steps",
  "current_state": "Exact status at compaction point: what is done, what is not, what is in progress",
  "key_decisions": [
    { "decision": "...", "reason": "...", "constraint": "..." }
  ],
  "eliminated_approaches": [
    { "approach": "...", "reason_ruled_out": "..." }
  ],
  "open_questions": ["..."],
  "next_steps": ["..."],
  "relevant_tool_results": {
    "key": "Only results future steps will need — summarized, not raw dumps"
  },
  "compacted_at_turn": 42
}
```

### Compactor system prompt

```
You are a conversation compactor. Read the provided conversation and produce a JSON summary that captures everything a fresh agent needs to continue the work without asking what happened.

Include:
- Current task and goal (not the steps taken to get here)
- Exact current state — what is done and what is not
- Decisions made and WHY (reasoning, not just the choice)
- Approaches tried and ruled out with reasons (prevents re-exploration)
- Open questions and blockers
- Concrete next steps in priority order
- Tool results that future steps will need (summarize, don't dump raw output)

Omit:
- Intermediate reasoning that led nowhere
- Completed sub-tasks with no future relevance
- Raw tool output that has already been acted on
- Anything derivable by reading the code or running a command

Output valid JSON matching the schema provided. No prose outside the JSON.
```

### Format history for compactor

```python
def format_history_for_compact(history: list[dict]) -> str:
    lines = ["Conversation to compact:\n"]
    for msg in history:
        role = msg["role"].upper()
        content = msg["content"] if isinstance(msg["content"], str) else "[tool use]"
        lines.append(f"[{role}]: {content[:2000]}")  # cap very long messages
    return "\n".join(lines)
```

---

## Step 5 — How to use after compacting: memory restoration

The compact object becomes the "memory" for the next turn or session. Inject it into the system prompt so it's always visible to the agent.

### Pattern A — System prompt injection (recommended)

```python
MEMORY_BLOCK_TEMPLATE = """
## Restored memory (compacted at turn {turn})

**Task**: {task}

**Current state**: {current_state}

**Key decisions**:
{decisions}

**Ruled out approaches**:
{eliminated}

**Next steps**:
{next_steps}

Begin from current state above. Do not re-explore eliminated approaches.
"""

def build_system_with_memory(base_system: str, compact: dict | None) -> str:
    if compact is None:
        return base_system
    memory = MEMORY_BLOCK_TEMPLATE.format(
        turn=compact["compacted_at_turn"],
        task=compact["task"],
        current_state=compact["current_state"],
        decisions="\n".join(f"- {d['decision']} (because {d['reason']})"
                            for d in compact["key_decisions"]),
        eliminated="\n".join(f"- {e['approach']}: {e['reason_ruled_out']}"
                             for e in compact["eliminated_approaches"]),
        next_steps="\n".join(f"- {s}" for s in compact["next_steps"]),
    )
    return base_system + "\n\n" + memory
```

### Pattern B — First message inj