context-manager
Manage the LLM's context window — token budgeting, prompt assembly, truncation strategies. Use when approaching context limits or optimizing prompt costs.
git clone --depth 1 https://github.com/Mark393295827/third-brain-v5-skills /tmp/context-manager && cp -r /tmp/context-manager/skills/context-manager ~/.claude/skills/context-managerSKILL.md
# Context Manager
Manage the scarcest resource in the LLM OS: **context window (RAM)**.
## Usage Template
**Prompt**
```text
Use context-manager for this task. Estimate token budget, choose what to keep, what to summarize, and what to drop.
```
**Use Case**
- Preparing a long task, large document set, or multi-step agent workflow without overrunning context.
**Expected Result**
- The agent produces a context budget, priority order, truncation plan, and cost-aware prompt assembly.
**Output Example**
- A table of keep/summarize/drop decisions with estimated tokens and retrieval priority.
**Verification Case**
- The final context plan names included sources, excluded sources, and the reason for each exclusion.
**Verified Effect**
- Long or messy context becomes a scoped prompt plan with lower token waste and clearer retrieval priorities.
## Success Metrics
- Output includes an estimated token budget and a keep/summarize/drop table.
- Each excluded source has a reason, and critical context has a retrieval path.
- Final prompt plan fits the target context window with margin.
## When to Use
- "Context is full", "too many tokens", "slow responses"
- Before complex multi-step tasks
- Designing prompt templates with variable-length content
- "Optimize my prompts" or "reduce token cost"
- Before ingesting large documents
---
## Core Principles
### 0. Concrete Ideas Buy Speed (Andrew Ng)
> "When you're vague, you're almost always right. When you're concrete, you may be right or wrong. Either way is fine — we can discover that much more fast."
| Vague | Concrete | Savings |
|-------|----------|:-------:|
| "Analyze this document" | "Extract 3 key metrics from earnings report, compare to last quarter" | ~60% |
| "Help with this code" | "Fix TypeScript error line 42: 'Property id does not exist on type User'" | ~70% |
| "Research this topic" | "Find 5 sources about Claude Code auto-mode security, focus on classifier" | ~50% |
### 1. Token Budgeting
```
Context Window = System Prompt + User Input + Retrieved Context + Tool Results + Agent Thoughts
Budget:
├─ System Prompt + Schema 10-15%
├─ User Request 5-10%
├─ Retrieved Context (RAG) 30-40%
├─ Tool Results 20-30%
└─ Agent Reasoning 10-15%
```
**Rule:** Exceeds 80% → must truncate/compress.
### 2. Prompt Assembly
```
Layer 1: Immutable Core (always injected)
├─ System prompt (persona + constraints)
├─ Schema/ontology
└─ Safety rules
Layer 2: Task Context (per-request)
├─ User request
├─ Relevant wiki pages (top-k)
└─ Recent session log
Layer 3: Ephemeral (auto-managed)
├─ Tool call history (trim old)
├─ Large outputs (summarize)
└─ Extended thinking (compact)
```
### 3. Truncation Strategies
| Strategy | When | How | Saved |
|----------|------|-----|:-----:|
| **Summarize** | Output >500 tokens | `summarize()` | ~70% |
| **Trim oldest** | Loop >10 turns | Remove earliest | ~40% |
| **Deduplicate** | Repeated content | Keep one copy | ~20% |
| **Drop results** | Action complete | Keep calls + errors only | ~50% |
| **Compact CoT** | Thinking used | 1-sentence conclusion | ~80% |
| **Cache prefix** | Repeated prompt | Identical prefix first | ~90% latency |
---
## Token Cost (Claude)
| Model | Input/MTok | Output/MTok | Window |
|-------|:----------:|:-----------:|:------:|
| Opus 4.6 | $15.00 | $75.00 | 200K |
| Sonnet 4.6 | $3.00 | $15.00 | 200K |
| Haiku 3.5 | $0.80 | $4.00 | 200K |
**Quick estimates:**
- 1 token ≈ 0.75 word
- 1 page ≈ 300-500 tokens
- 1 source ≈ 2000-8000 tokens
- 1 hour agentic ≈ 100K-500K tokens
- Weekly active user ≈ $9-15
---
## Decision Tree
```
Simple task (1-2 steps)?
├─ YES → Haiku, ~10K tokens
└─ NO → Analytical (research/compile)?
├─ YES → Opus, ~50-100K
└─ NO → Procedural (ingest/lint)?
├─ YES → Sonnet, ~20-50K
└─ NO → Sonnet default
```
---
## Long-Horizon Compaction Contract
For long tasks, compaction is not just shorter text. The summary must preserve the state needed to continue without drift:
```markdown
Goal:
Current definition of done:
User constraints and denied actions:
Key files, sources, or artifacts:
Completed steps:
Failed paths and why:
Verification evidence:
Open risks:
Next action:
```
Use compaction before context pressure becomes an emergency. After resuming from a compacted state, run one quick continuity check: confirm the goal, constraints, current step, and evidence before taking the next action.
---
## Tokenmaxxing vs Efficiency
> "Token maxing is actually the coolest thing you can do now." — Gary Tan
| Strategy | When | Approach |
|----------|------|----------|
| **Tokenmaxxing** | Research, deep analysis | 20 sources, cross-reference everything |
| **Efficiency** | Production, cost-sensitive | Concrete prompts, caching, truncation |
**Thin Harness, Fat Skills:**
- Simple tasks → thin prompt
- Complex tasks → fat prompt with examples
### When to Tokenmax
| Scenario | Approach | Quality Gain |
|----------|----------|:------------:|
| Research synthesis | 20 sources, cross-reference | 3-5x |
| Code review | Full repo analysis | 2-3x fewer bugs |
| Creative writing | Multiple drafts, self-critique | Significantly better |
### When to Be Efficient
| Scenario | Approach | Savings |
|----------|----------|:-------:|
| Routine linting | Haiku, minimal context | ~80% |
| Simple edits | Concrete prompt, no CoT | ~60% |
| Status checks | One-liner | ~90% |
---
## Quality Gates
- [ ] Token budget <80% before first call
- [ ] 3-layer prompt assembly
- [ ] Large outputs (>500 tokens) summarized
- [ ] Long-horizon compactions preserve goal, constraints, evidence, risks, and next action
- [ ] Cost estimated for >100K token tasks
- [ ] Cache-friendly ordering
- [ ] Context utilization loggedExecute a daily knowledge compound closed loop — 7 Key Results from input to feedback with scoring. Use when the user wants to do a daily review, plan their day, or run a knowledge workflow.
Extract reusable knowledge from a work session and save concepts, entities, corrections, patterns, ideas, decisions, and gaps to the wiki. Use when ending a session or when the user says to extract knowledge.
Estimate and track token usage and cost across the knowledge pipeline. Run before expensive tasks to budget, after tasks to log actuals.
Health-check the knowledge wiki — find orphans, broken links, missing frontmatter, contradictions, stale content, and statistical drift. Use when the user says "lint the wiki", "health check", or periodically for maintenance.
Command multi-agent work with bounded roles, ownership, integration gates, and verification loops. Use when the user needs Claude Code Agent Teams, parallel agents, delegation strategy, or multi-agent orchestration.
Design or refactor agent skills, workflows, and operating loops for model-native Agentic Engineering. Use when making skills more autonomous, concise, verifiable, long-horizon capable, token-efficient, and lower-friction for human-LLM collaboration.
Design an AI Six Sigma Black Belt operating model for property service, maintenance dispatch, environmental testing, quote generation, CRM follow-up, and workflow quality dashboards. Use when the user needs a Property Agent OS, AI + Ontology + DMAIC management system, CTQ metrics, agent-team roles, work-order states, or MVP roadmap for operations quality.
Improve a personal or team operating system with self-evolving loops, CASH allocation, 3B creativity, predictive coding, and diagnostics. Use when the user wants to redesign a work method, learning loop, or cognitive operating system.