Permafrost: Freeze Claude Code prompt prefixes and cut API call costs

The system prompt prefix that Claude Code sends with each API call can consume several hundred tokens. Multiply that across hundreds or thousands of invocations daily, and your bill grows fast, especially when your backend isn't Anthropic's own API but an alternative provider that charges per token. Permafrost is an open-source utility released this week that tackles exactly this problem: it freezes that prefix so the model doesn't have to process the same context tokens over and over again.

The project appeared on Hacker News on June 10, 2026 with minimal initial traction, but deserves technical attention for what it proposes and the context in which using it makes sense.

What Permafrost does exactly

When Claude Code executes a tool or completes a step in an agent workflow, it constructs a prompt that includes a static prefix: system instructions, configuration context, definitions of active skills or subagents. That prefix is identical or nearly identical across most consecutive calls within the same session.

Permafrost intercepts those calls before they reach the model endpoint and applies a prompt caching technique at the prefix level: it serializes the static block, marks it as frozen, and reuses the cached representation in subsequent calls, preventing the model from tokenizing and processing it from scratch each time.

The repository includes the author's own benchmarks measuring 64% cost savings on DeepSeek bills. It's important to frame that number: the percentage is what appears in the project description and reflects the author's specific use case, not an independent audit. Real results will depend on prefix size, calls per session, and the specific provider.

Why the problem is real even if the number is optimistic

Cost optimization in agentic workflows isn't a minor issue. When using Claude Code with complex configurations, several active subagents, multiple MCP servers declared, custom skills, the context traveling in each call can be substantial. In projects with chained hooks or long-running pipelines, that overhead accumulates.

Anthropric already offers its own prompt caching functionality in the API for explicitly marked blocks, but that option requires developers to instrument the code manually and isn't always available or active on third-party backends. Permafrost proposes a proxy layer that works independently of the underlying provider, making it useful precisely in scenarios where you're using an alternative model behind an Anthropic API-compatible interface.

Who it makes sense for

This type of tool is mainly relevant for three profiles:

Engineering teams that have deployed Claude Code in CI/CD pipelines or internal automation workflows with high call frequency.
Developers using alternative backends (DeepSeek, local models via proxy, etc.) where each token is billed without native cache discounts.
Those experimenting with complex agentic configurations who want to reduce accumulated latency alongside cost, since processing fewer tokens also speeds up response time with some providers.

For casual Claude Code use with Anthropic's official API and short sessions, the benefit will be marginal compared to enabling native prompt caching.

Project status and caveats

Permafrost is currently in very early stages: the Hacker News thread had 2 points and 2 comments at time of publication. The code is available on GitHub but it's worth checking the maintenance status before integrating it into production environments. The technical proposal is sound in concept, but like any proxy that sits between your client and the model endpoint, it introduces an additional dependency and a potential failure point.

That said, the problem it solves is legitimate, and the approach, a proxy specialized in freezing static prefixes, is more surgical than generic full-response caching solutions.

---

We see this as a pragmatic bet for those already running agentic pipelines with Claude Code in production and starting to scrutinize their token bills more carefully. That it comes from the community, not Anthropic, says something about where real optimization work in the ecosystem currently lies.

Permafrost: Freeze Claude Code prompt prefixes and cut API call costs

What Permafrost does exactly

Why the problem is real even if the number is optimistic

Who it makes sense for

Project status and caveats

Sources

Read next

MCP is becoming the default standard for building agents

AI Toolbox touts support for a Claude Opus version not in the catalog

One Click in the Browser, Context for Any Agent