Flatten Claude Code sessions: keep every prompt and event verbatim, resume at a lower token count. An MCP server.
claude mcp add flatten-mcp -- npx -y flatten-mcp{
"mcpServers": {
"flatten-mcp": {
"command": "npx",
"args": ["-y", "flatten-mcp"]
}
}
}MCP Servers overview
<p align="center">
<img src="https://raw.githubusercontent.com/shayaShav/flatten-mcp/main/assets/logo.png" alt="flatten-mcp logo" width="160">
</p>
# flatten-mcp
> Resume the **exact same conversation** at a lower token cost — without compacting it into a lossy summary.
<p align="left">
<a href="https://www.npmjs.com/package/flatten-mcp"><img alt="npm version" src="https://img.shields.io/npm/v/flatten-mcp.svg"></a>
<a href="https://www.npmjs.com/package/flatten-mcp"><img alt="npm downloads" src="https://img.shields.io/npm/dm/flatten-mcp.svg"></a>
<a href="https://github.com/shayaShav/flatten-mcp/blob/main/LICENSE"><img alt="License: MIT" src="https://img.shields.io/badge/License-MIT-green.svg"></a>
<a href="https://nodejs.org"><img alt="Node >= 18" src="https://img.shields.io/badge/node-%3E%3D18-339933?logo=nodedotjs&logoColor=white"></a>
<a href="https://modelcontextprotocol.io"><img alt="Model Context Protocol" src="https://img.shields.io/badge/MCP-server-6E56CF.svg"></a>
<a href="https://docs.claude.com/en/docs/claude-code"><img alt="Built for Claude Code" src="https://img.shields.io/badge/built%20for-Claude%20Code-D97757.svg"></a>
<a href="https://smithery.ai/server/@shaya-shaviv/flatten-mcp"><img alt="Smithery calls" src="https://smithery.ai/badge/@shaya-shaviv/flatten-mcp"></a>
</p>
**flatten-mcp** is a [Model Context Protocol](https://modelcontextprotocol.io) server for [Claude Code](https://docs.claude.com/en/docs/claude-code). It shrinks a session's token footprint by moving bulky tool output (large file reads, command logs, base64 screenshots) out of the conversation and into a sidecar file — leaving a tiny, retrievable reference in its place. Your prompts and the chronological flow of the session are preserved **verbatim** — those lines are never rewritten. You resume the same raw conversation; it just costs less to carry.
See how 317,236 tokens turned into 182,287:
https://github.com/user-attachments/assets/4672b3cd-f78f-4146-97ba-e0077b655381
---
## Why flatten instead of compact?
The standard answer to a full context window is **compaction**: the model reads the whole conversation and rewrites it into a shorter summary. That summary is lossy by construction — an *interpretation* of your history, and interpretations drift, smooth over the awkward parts, and quietly drop the detail you didn't know you'd need. But the history is exactly what's worth keeping verbatim: the words you typed at 2 a.m., the precise order of events, the dead ends and the decisions. A fuzzy, half-formed prompt carries more raw truth about your intent than any tidy paragraph written *about* it after the fact — and preserving it untouched is the foundation of trust in a coding agent.
**Flattening is the opposite move.** It changes *nothing* about what was said. In most sessions the model reads a lot — large files, long logs, multiple sources — and keeps every byte of it in context, even though it has nearly always already **written down the conclusion in plain prose**: the one line that mattered in a 2 MB log, the finding distilled from five files, the running tally of open tasks. The raw source has done its job. Flattening lifts those already-summarized blocks out and swaps each for a lightweight reference ID — so starting cold from a flattened session is usually smooth sailing, and on the rare occasion the raw bytes *are* needed, they're one `retrieve_flattened` call away.
```
What sits in the context window:
USER "fix the crash"
ASSISTANT reading the logs…
TOOL_RESULT ▓▓▓ 2 MB log dump ▓▓▓ ← bulk; already summarized in prose below
ASSISTANT "the OOM is at line 88,402 — the fix is …"
After flatten — same words, only the bulk set aside:
USER "fix the crash"
ASSISTANT reading the logs…
TOOL_RESULT [FLATTENED id=… → sidecar] ← one marker; fetch the full dump on demand
ASSISTANT "the OOM is at line 88,402 — the fix is …"
```
## What you'll actually save
Token reduction depends entirely on what the session did:
- **Read-heavy sessions** (lots of large files, logs, or screenshots in context) — expect reductions **up to ~50%**.
- **Prose-heavy sessions** (little external data ingested) — savings are negligible. There's simply not much bulk to move.
- It varies a lot — often a pleasant surprise, and once in a while a touch underwhelming.
**When to reach for it.** A common point is around **200k** tokens. For critical sessions where you want the model at its sharpest and most context-aware, flattening around **250k–300k** is where the most dramatic reductions tend to show up.
**Flatten smartly**, the same way you wouldn't compact mid-way through a large reading task. That said, nothing is ever lost — flattening everything and then cherry-picking the few blocks you still need is a perfectly legitimate strategy.
---
## Quick start
> Requires **Node.js ≥ 18** and **Claude Code**.
One command — installs from [npm](https://www.npmjs.com/package/flatten-mcp) and registers it user-wide:
```bash
claude mcp add flatten -s user -- npx -y flatten-mcp@latest
```
Or register it manually (in `~/.claude.json`, or your project's `.mcp.json`):
```json
{
"mcpServers": {
"flatten": {
"command": "npx",
"args": ["-y", "flatten-mcp@latest"]
}
}
}
```
Recommended — install the `/flatten` slash command:
```bash
curl -fsSL https://raw.githubusercontent.com/shayaShav/flatten-mcp/main/commands/flatten.md -o ~/.claude/commands/flatten.md
```
<details>
<summary><b>From source</b> (for development)</summary>
```bash
git clone https://github.com/shayaShav/flatten-mcp.git
cd flatten-mcp
npm install # builds automatically via the "prepare" script
cp commands/flatten.md ~/.claude/commands/ # optional: installs the /flatten command
```
Register the local build instead:
```json
{
"mcpServers": {
"flatten": {
"command": "node",
"args": ["/absolute/path/to/flatten-mcp/dist/index.js"]
}
}
}
```
</details>
### Uninstall
```bash
claude mcp remove flatten -s user # unregister the server
rm -f ~/.claude/commands/flatten.md # remove the /flatten command, if installed
```
Flatten artifacts (`.flat.jsonl` sidecars, `.bak` backups) live next to your session files and are not deleted by uninstalling. To reclaim the disk, run `prune_flatten_artifacts` (with `include_sidecars: true`) **before** unregistering — or delete them manually from `~/.claude/projects/<encoded-project-dir>/`. Mind that flattened sessions need their sidecar for `retrieve_flattened` / `unflatten_session` — unflatten first if you want the bulk back inline.
### Configuration
By default the server operates on **the project the CLI runs in** (its current working directory). Pass `project_dir` explicitly on any call to target a different project.
| Env var | Required | Purpose |
| --- | --- | --- |
| `ANTHROPIC_API_KEY` | no | If set, token savings are counted **exactly** via Anthropic's free `count_tokens` endpoint instead of estimated locally. |
| `FLATTEN_COUNT_MODEL` | no | Model id used for the exact token count (default: `claude-haiku-4-5-20251001`). |
## Usage
> [!CAUTION]
> **Always exit the session you want to flatten with `Ctrl-C`, then flatten it from a *different* window.** Rewriting a live session's file out from under Claude Code corrupts its in-memory state and bricks the session.
1. **Exit the session you want to flatten** with `Ctrl-C`. This is mandatory — a 10-second live-write guard refuses to touch a recently-modified session unless you force it, but exiting is the safe path.
2. In a **new** Claude Code window, type `/flatten latest` or `/flatten <session-id>` — or ask:
> "Flatten the latest session." · or · "Flatten session `<session-id>`."
`/flatten latest` (or bare `/flatten`) flattens the **larger** of the two most recent sessions — the smaller, seconds-old one is almost always the window doing the flattening itself, and the session worth flattening is the big one. It never forces past the live-write guard.
3. **Resume** your original session and send a prompt. When Claude starts outputting text, you'll see the token count drop.
To preview without touching anything, ask for a **dry run** first. To undo, ask to **unflatten** the session — every original block is restored to its exact original value.
> [!TIP]
> Flattening needs no model intelligence — park a second window on a fast, inexpensive model (`/model haiku`) as a dedicated flattening station and just type `/flatten latest`.
### Validate the claims yourself
Every number flatten reports can be checked end to end in a couple of minutes:
1. Pick a meaty session — or make one: have Claude read a few large files, then exit with `Ctrl-C`.
2. In a new window, ask for a **dry run** — *"dry-run flatten the latest session"* — and read the report: `flattenedCount`, `contextTokensSaved` of `contextTokensTotal`, `diskBytesSaved`. Nothing has been written yet.
3. Run `/flatten latest` for real, `claude --resume` the original session, and send any prompt — the context indicator drops by roughly the reported amount (exactly, when `ANTHROPIC_API_KEY` is set).
4. Check reversibility: ask to **unflatten** the session, then diff the restored `.jsonl` against the `.jsonl.bak` backup created at flatten time — identical for Claude Code's canonical JSON.
## Tools
| Tool | What it does |
| --- | --- |
| `flatten_session` | Move bulky tool results into a sidecar, leaving `[FLATTENED …]` markers. Crash-safe and reversible. Supports `dry_run`, `min_size`, `force`, and `include_tool_use_result`. |
| `retrieve_flattened` | Fetch one original block back by its id — returns the original text, or re-renders a flattened screenshot as a real image. |
| `unflatten_session` | Reverse a flatten completely: re-inline every block from the sidecar, restoring each flattened result to its exact original value. |
| `prune_flatten_artifacts` | Reclaim disk by deleting What people ask about flatten-mcp
What is shayaShav/flatten-mcp?
+
shayaShav/flatten-mcp is mcp servers for the Claude AI ecosystem. Flatten Claude Code sessions: keep every prompt and event verbatim, resume at a lower token count. An MCP server. It has 0 GitHub stars and was last updated today.
How do I install flatten-mcp?
+
You can install flatten-mcp by cloning the repository (https://github.com/shayaShav/flatten-mcp) or following the README instructions on GitHub. ClaudeWave also provides quick install blocks on this page.
Is shayaShav/flatten-mcp safe to use?
+
shayaShav/flatten-mcp has not been audited yet by our security agent. Review the original repository on GitHub before using it in production.
Who maintains shayaShav/flatten-mcp?
+
shayaShav/flatten-mcp is maintained by shayaShav. The last recorded GitHub activity is from today, with 0 open issues.
Are there alternatives to flatten-mcp?
+
Yes. On ClaudeWave you can browse similar mcp servers at /categories/mcp, sorted by popularity or recent activity.
Deploy flatten-mcp to your cloud
Ship this repo to production in minutes. Each platform spins up its own environment with editable env vars.
Maintain this repo? Add a badge to your README
Drop the badge into your GitHub README to show it's tracked on ClaudeWave. Each badge links back to this page and reflects the live Trust Score.
[](https://claudewave.com/repo/shayashav-flatten-mcp)<a href="https://claudewave.com/repo/shayashav-flatten-mcp"><img src="https://claudewave.com/api/badge/shayashav-flatten-mcp" alt="Featured on ClaudeWave: shayaShav/flatten-mcp" width="320" height="64" /></a>More MCP Servers
Fair-code workflow automation platform with native AI capabilities. Combine visual building with custom code, self-host or cloud, 400+ integrations.
User-friendly AI Interface (Supports Ollama, OpenAI API, ...)
An open-source AI agent that brings the power of Gemini directly into your terminal.
The fastest path to AI-powered full stack observability, even for lean teams.
🕷️ An adaptive Web Scraping framework that handles everything from a single request to a full-scale crawl!
⭐AI-driven public opinion & trend monitor with multi-platform aggregation, RSS, and smart alerts.🎯 告别信息过载,你的 AI 舆情监控助手与热点筛选工具!聚合多平台热点 + RSS 订阅,支持关键词精准筛选。AI 智能筛选新闻 + AI 翻译 + AI 分析简报直推手机,也支持接入 MCP 架构,赋能 AI 自然语言对话分析、情感洞察与趋势预测等。支持 Docker ,数据本地/云端自持。集成微信/飞书/钉钉/Telegram/邮件/ntfy/bark/slack 等渠道智能推送。