context-assembly-scorer
Scores how well the current context represents the full conversation — detects information blind spots, stale summaries, and coverage gaps that cause the agent to forget critical details.
git clone --depth 1 https://github.com/ArchieIndian/openclaw-superpowers /tmp/context-assembly-scorer && cp -r /tmp/context-assembly-scorer/skills/openclaw-native/context-assembly-scorer ~/.claude/skills/context-assembly-scorerSKILL.md
# Context Assembly Scorer ## What it does When an agent compacts context, it loses information. But how much? And which information? Context Assembly Scorer answers these questions by measuring **coverage** — the ratio of important topics in the full conversation history that are represented in the current assembled context. Inspired by [lossless-claw](https://github.com/Martian-Engineering/lossless-claw)'s context assembly system, which carefully selects which summaries to include in each turn's context to maximize information coverage. ## When to invoke - Automatically every 4 hours (cron) — silent coverage check - Before starting a task that depends on prior context — verify nothing critical is missing - After compaction — measure information loss - When the agent says "I don't remember" — diagnose why ## Coverage dimensions | Dimension | What it measures | Weight | |---|---|---| | Topic coverage | % of conversation topics present in current context | 2x | | Recency bias | Whether recent context is over-represented vs. older important context | 1.5x | | Entity continuity | Named entities (files, people, APIs) mentioned in history that are missing from context | 2x | | Decision retention | Architectural decisions and user preferences still accessible | 2x | | Task continuity | Active/pending tasks that might be lost after compaction | 1.5x | ## How to use ```bash python3 score.py --score # Score current context assembly python3 score.py --score --verbose # Detailed per-dimension breakdown python3 score.py --blind-spots # List topics missing from context python3 score.py --drift # Compare current vs. previous scores python3 score.py --status # Last score summary python3 score.py --format json # Machine-readable output ``` ## Procedure **Step 1 — Score context coverage** ```bash python3 score.py --score ``` The scorer reads MEMORY.md (full history) and compares it against what's currently accessible. Outputs a coverage score from 0–100% with a letter grade. **Step 2 — Find blind spots** ```bash python3 score.py --blind-spots ``` Lists specific topics, entities, and decisions that exist in full history but are missing from current context — these are what the agent has effectively "forgotten." **Step 3 — Track drift over time** ```bash python3 score.py --drift ``` Shows how coverage has changed across the last 20 scores. Identify if compaction is progressively losing more information. ## Grading | Grade | Coverage | Meaning | |---|---|---| | A | 90–100% | Excellent — minimal information loss | | B | 75–89% | Good — minor gaps, unlikely to cause issues | | C | 60–74% | Fair — some important context missing | | D | 40–59% | Poor — significant blind spots | | F | 0–39% | Critical — agent is operating with major gaps | ## State Coverage scores and blind spot history stored in `~/.openclaw/skill-state/context-assembly-scorer/state.yaml`. Fields: `last_score_at`, `current_score`, `blind_spots`, `score_history`. ## Notes - Read-only — does not modify context or memory - Topic extraction uses keyword clustering, not LLM calls - Entity detection uses regex patterns for file paths, URLs, class names, API endpoints - Decision detection looks for markers: "decided", "chose", "prefer", "always", "never" - Recency bias is measured as the ratio of recent-vs-old entry representation
Syncs agent daily memory and MEMORY.md to an Obsidian vault so notes are human-browsable. Use nightly or on demand.
Structured ideation before any implementation. Use when starting any non-trivial task.
Scaffolds and validates new superpowers skills. Use when creating a new skill for this repository.
Executes plans task-by-task with verification. Use when implementing a plan.
Triggers a secondary verification pass for any agent output containing factual claims, numbers, dates, or named entities before the output is acted on
Crawls a new codebase to infer stack, conventions, and key invariants, then generates a PROJECT.md context file for the agent
Handles PR review feedback by fetching comments, grouping issues, fixing one group at a time, and verifying before replies.
Detects skill name shadowing and description-overlap conflicts that cause OpenClaw to trigger the wrong skill or silently ignore one when two skills compete for the same intent.