Permafrost: Freeze Claude Code prompt prefixes and cut API call costs
An open-source tool promises to reduce token spending by up to 64% by freezing Claude Code's prompt prefix before sending it to the model.
The system prompt prefix that Claude Code sends with each API call can consume several hundred tokens. Multiply that across hundreds or thousands of invocations daily, and your bill grows fast, especially when your backend isn't Anthropic's own API but an alternative provider that charges per token. Permafrost is an open-source utility released this week that tackles exactly this problem: it freezes that prefix so the model doesn't have to process the same context tokens over and over again.
The project appeared on Hacker News on June 10, 2026 with minimal initial traction, but deserves technical attention for what it proposes and the context in which using it makes sense.
What Permafrost does exactly
When Claude Code executes a tool or completes a step in an agent workflow, it constructs a prompt that includes a static prefix: system instructions, configuration context, definitions of active skills or subagents. That prefix is identical or nearly identical across most consecutive calls within the same session.
Permafrost intercepts those calls before they reach the model endpoint and applies a prompt caching technique at the prefix level: it serializes the static block, marks it as frozen, and reuses the cached representation in subsequent calls, preventing the model from tokenizing and processing it from scratch each time.
The repository includes the author's own benchmarks measuring 64% cost savings on DeepSeek bills. It's important to frame that number: the percentage is what appears in the project description and reflects the author's specific use case, not an independent audit. Real results will depend on prefix size, calls per session, and the specific provider.
Why the problem is real even if the number is optimistic
Cost optimization in agentic workflows isn't a minor issue. When using Claude Code with complex configurations, several active subagents, multiple MCP servers declared, custom skills, the context traveling in each call can be substantial. In projects with chained hooks or long-running pipelines, that overhead accumulates.
Anthropric already offers its own prompt caching functionality in the API for explicitly marked blocks, but that option requires developers to instrument the code manually and isn't always available or active on third-party backends. Permafrost proposes a proxy layer that works independently of the underlying provider, making it useful precisely in scenarios where you're using an alternative model behind an Anthropic API-compatible interface.
Who it makes sense for
This type of tool is mainly relevant for three profiles:
- Engineering teams that have deployed Claude Code in CI/CD pipelines or internal automation workflows with high call frequency.
- Developers using alternative backends (DeepSeek, local models via proxy, etc.) where each token is billed without native cache discounts.
- Those experimenting with complex agentic configurations who want to reduce accumulated latency alongside cost, since processing fewer tokens also speeds up response time with some providers.
Project status and caveats
Permafrost is currently in very early stages: the Hacker News thread had 2 points and 2 comments at time of publication. The code is available on GitHub but it's worth checking the maintenance status before integrating it into production environments. The technical proposal is sound in concept, but like any proxy that sits between your client and the model endpoint, it introduces an additional dependency and a potential failure point.
That said, the problem it solves is legitimate, and the approach, a proxy specialized in freezing static prefixes, is more surgical than generic full-response caching solutions.
---
We see this as a pragmatic bet for those already running agentic pipelines with Claude Code in production and starting to scrutinize their token bills more carefully. That it comes from the community, not Anthropic, says something about where real optimization work in the ecosystem currently lies.
Sources
Read next
COOCON joins AAIF to connect payments and MCP in AI agents
South Korean fintech COOCON is joining the global AAIF foundation to integrate payments and data business based on MCP within the AI agents ecosystem.
Webull lanza un servidor MCP para trading con IA
El bróker Webull integra el Model Context Protocol de Anthropic para que agentes de IA accedan a datos de mercado en tiempo real desde sus flujos de trabajo.
Vera: AI-Powered Smart Contract Audits Without Third Parties
Vera is an open-source tool that audits smart contracts using AI autonomously, eliminating the need for external audit firms or manual review processes.