Skill70 repo starsupdated 2mo ago

context-assembly-scorer

Context Assembly Scorer measures how much information an agent loses when compacting conversation context by calculating coverage across five dimensions: topic presence, recency balance, entity continuity, decision retention, and task continuity. Use it to diagnose memory gaps before critical tasks, automatically monitor context health every four hours, or identify what details the agent has forgotten after compression.

View source Repository: openclaw-superpowers

Install in Claude Code

Copy

git clone --depth 1 https://github.com/ArchieIndian/openclaw-superpowers /tmp/context-assembly-scorer && cp -r /tmp/context-assembly-scorer/skills/openclaw-native/context-assembly-scorer ~/.claude/skills/context-assembly-scorer

Then start a new Claude Code session; the skill loads automatically.

Definition

SKILL.md

# Context Assembly Scorer

## What it does

When an agent compacts context, it loses information. But how much? And which information? Context Assembly Scorer answers these questions by measuring **coverage** — the ratio of important topics in the full conversation history that are represented in the current assembled context.

Inspired by [lossless-claw](https://github.com/Martian-Engineering/lossless-claw)'s context assembly system, which carefully selects which summaries to include in each turn's context to maximize information coverage.

## When to invoke

- Automatically every 4 hours (cron) — silent coverage check
- Before starting a task that depends on prior context — verify nothing critical is missing
- After compaction — measure information loss
- When the agent says "I don't remember" — diagnose why

## Coverage dimensions

| Dimension | What it measures | Weight |
|---|---|---|
| Topic coverage | % of conversation topics present in current context | 2x |
| Recency bias | Whether recent context is over-represented vs. older important context | 1.5x |
| Entity continuity | Named entities (files, people, APIs) mentioned in history that are missing from context | 2x |
| Decision retention | Architectural decisions and user preferences still accessible | 2x |
| Task continuity | Active/pending tasks that might be lost after compaction | 1.5x |

## How to use

```bash
python3 score.py --score                      # Score current context assembly
python3 score.py --score --verbose             # Detailed per-dimension breakdown
python3 score.py --blind-spots                 # List topics missing from context
python3 score.py --drift                       # Compare current vs. previous scores
python3 score.py --status                      # Last score summary
python3 score.py --format json                 # Machine-readable output
```

## Procedure

**Step 1 — Score context coverage**

```bash
python3 score.py --score
```

The scorer reads MEMORY.md (full history) and compares it against what's currently accessible. Outputs a coverage score from 0–100% with a letter grade.

**Step 2 — Find blind spots**

```bash
python3 score.py --blind-spots
```

Lists specific topics, entities, and decisions that exist in full history but are missing from current context — these are what the agent has effectively "forgotten."

**Step 3 — Track drift over time**

```bash
python3 score.py --drift
```

Shows how coverage has changed across the last 20 scores. Identify if compaction is progressively losing more information.

## Grading

| Grade | Coverage | Meaning |
|---|---|---|
| A | 90–100% | Excellent — minimal information loss |
| B | 75–89% | Good — minor gaps, unlikely to cause issues |
| C | 60–74% | Fair — some important context missing |
| D | 40–59% | Poor — significant blind spots |
| F | 0–39% | Critical — agent is operating with major gaps |

## State

Coverage scores and blind spot history stored in `~/.openclaw/skill-state/context-assembly-scorer/state.yaml`.

Fields: `last_score_at`, `current_score`, `blind_spots`, `score_history`.

## Notes

- Read-only — does not modify context or memory
- Topic extraction uses keyword clustering, not LLM calls
- Entity detection uses regex patterns for file paths, URLs, class names, API endpoints
- Decision detection looks for markers: "decided", "chose", "prefer", "always", "never"
- Recency bias is measured as the ratio of recent-vs-old entry representation