Skip to main content
ClaudeWave
Skill1.2k repo starsupdated today

ingest_triage

The ingest_triage skill classifies and resolves conflicts when new artifacts risk colliding with existing ones during bundle ingestion. It applies a five-step decision tree to determine whether conflicts are duplicates (resolved silently or by canonical election), expression-only changes (replaced without flagging), semantic breaks (replaced with flagging), definitional contradictions (renamed and always flagged), or evictions (removed per eviction list). Use this skill before every write operation in Stage 3 WorkUnit agents and Stage 4 reconciliation agents to maintain artifact integrity and prevent unintended overwrites.

Install in Claude Code
Copy
git clone --depth 1 https://github.com/Kaelio/ktx /tmp/ingest_triage && cp -r /tmp/ingest_triage/packages/cli/src/skills/ingest_triage ~/.claude/skills/ingest_triage
Then start a new Claude Code session; the skill loads automatically.

SKILL.md

# Ingest Triage - conflict classification and resolution

This skill is loaded in two contexts:
- By a Stage 3 WorkUnit agent when `sl_discover`, deterministic projection
  output, existing project memory, or prior provenance overlaps with what the
  current WorkUnit is about to write.
- By the Stage 4 reconciliation agent for cross-WorkUnit sweeps, accepted patch
  overlap, and eviction decisions.

Apply the rules below before every write that could collide with an existing artifact.

## Decision tree

1. **Is this the same artifact I'm producing now, or a different one with the same name?**
   Read both. If names match and content matches (modulo whitespace): no conflict - skip the write, the prior one stands.

2. **If content differs, is it an expression-only change (e.g. a different `sql:` body for the same measure name, same grain, same columns)?**
   Re-ingest change (expression-only): silently replace via `sl_edit_source`. No flag.

3. **If the difference is structural - grain, columns, filter, join shape - is the current bundle the re-ingest of a previously-ingested bundle (i.e. `priorProvenance` has a row for this raw file and artifact)?**
   Re-ingest change (semantic break): replace + flag. Record in the IngestReport's `conflicts_resolved` list with `flagged_for_human: true`.

4. **If reconciliation sees accepted patches from this same job with no
prior-sync row, check for same-ingest contradictions:**

   | Kind | Detection | Resolution |
   |---|---|---|
   | Structural duplicate | Same name, near-identical expression | Elect canonical by: (a) highest inbound-ref count from other sources; tiebreak: (b) lexicographically first unit key; (c) lexicographically first source name. Subsume losers into `<canonical>-variants.md` wiki page. Do NOT flag unless ambiguous. |
   | Near-duplicate cluster | Different names, overlapping shape (same table, similar formulas) | Same as structural; one canonical, others subsumed. Flag only if no canonical emerges. |
   | Definitional contradiction | Same name, substantively different formulas (different aggregation, different filters, different columns) | **Rename + capture**: disambiguate ALL variants with suffix derived from the domain (`churn_risk_engagement_based`, `churn_risk_billing_based`) and write a unified wiki page listing every variant with provenance. The contested name does NOT land in the SL. **Always flag.** |

5. **Eviction (Stage 4 only)**: for each entry in `eviction_list()`:
   - Remove the artifact (`sl_write_source` or `sl_edit_source` with `delete: true` for SL sources, `wiki_remove` for wiki pages).
   - Record the removal with `emit_eviction_decision` and `action: "removed"`.

## Why same-ingest vs re-ingest differs

Within ONE bundle there's no user signal telling us which duplicate wins - we capture all variants and flag. Across bundles, re-uploading IS the signal that the new state is intended - we replace silently for expression changes and flag for semantic breaks.

## Naming disambiguation hints

When you rename to disambiguate, prefer domain suffixes that match the containing view/table/collection name: `customers.churn_risk_score` → `customers.churn_risk_engagement_based` (if the `customer_churn` view computes it from engagement); `billing.churn_risk_score` → `billing.churn_risk_billing_based`. Avoid numeric suffixes (`churn_risk_1`, `churn_risk_2`) - they disclose nothing.

## Applying canonical pins

When the Stage 4 system prompt includes a `<canonical_pins>` block, treat each pin as a prior user decision for that `contestedKey`.

- If the pinned `canonicalArtifactKey` is present in the Stage Index or already exists in SL, keep it as the canonical artifact for that contested key.
- Disambiguate competing artifacts instead of using the contested name for them.
- Do not flag the pinned contested key solely because the variants disagree; the user has already chosen the canonical artifact.
- If the pinned artifact cannot be found and no current WU can recreate it, emit `emit_conflict_resolution` with `flaggedForHuman: true` and explain that the pin references a missing canonical artifact.

When a pin applies cleanly, call `emit_conflict_resolution` with `kind: "definitional_contradiction"`, `artifactKey` set to the pinned `canonicalArtifactKey`, `detail` describing the pinned election, and `flaggedForHuman: false`.

## What to write in the unified wiki page

When you perform rename + capture, also write one page named `<canonical-concept>-definitions.md` under the wiki GLOBAL scope. Structure:

- One heading per variant, referencing the disambiguated SL name.
- One paragraph per variant: what it computes, where it came from (raw file + line range), when to use it.
- A closing "Choosing between these" paragraph if the variants are legitimately domain-specific.

Do not attempt to rank variants or pick a "best" - that's user-override territory.

## Silence rules

Flag for human review when:
- You did rename + capture for a definitional contradiction (kind 3 above).
- You performed an eviction retention (kind 5, second row).
- An override constraint (from a Stage 4 re-run) conflicts with current inbound refs.

Do NOT flag:
- Same-content duplicate skip (trivial).
- Structural duplicate with clear canonical election.
- Expression-only re-ingest replace.
ktx-analyticsSkill

Use when answering a question that needs data from a ktx-connected database - investigating, analyzing, "how many", "show me", "what's the breakdown of", finding records by value, exploring tables, comparing periods, explaining metrics, or any data-analysis request. Triggers even when the user does not say "analytics"; if the answer requires querying a configured ktx connection, this skill applies.

dbt_ingestSkill

Map dbt `schema.yml` / `properties.yml` models and sources into ktx semantic-layer overlays and column notes. Covers `sources:` vs `models:`, column `data_tests` (not_null, unique, accepted_values, relationships), and how bundle-time writes complement manifest backfill from git sync. Load when the WorkUnit's `skillNames` includes `dbt_ingest` or when raw files are dbt YAML under `models/` / `sources/`.

historic_sql_patternsSkill

Identify recurring cross-table historic-SQL analytical intents from a bounded pattern shard and emit typed pattern evidence for deterministic wiki projection.

historic_sql_table_digestSkill

Convert one changed historic-SQL table usage bucket into typed table usage evidence for deterministic _schema projection.

live_database_ingestSkill

Capture semantic-layer and knowledge updates from a live database schema snapshot.

looker_ingestSkill

Extract durable ktx knowledge and semantic-layer contribution proposals from staged Looker runtime dashboard, Look, and explore JSON. Load for WorkUnits whose raw files are under explores/, dashboards/, or looks/.

lookml_ingestSkill

Map a LookML view/model/explore into ktx semantic layer sources. Covers the LookML to ktx primitive table, provenance tagging, and three worked examples (overlay, standalone from derived_table, standalone with sql_always_where). Load when the turn contains `.lkml` content.

metabase_ingestSkill

Convert Metabase questions, models, and metrics into ktx Semantic Layer source definitions. Covers result-metadata to KSL column type mapping, FK/PK detection, near-duplicate deduplication, pre-aggregation decomposition, join-graph connectivity, and how to react to priorProvenance from earlier ingest syncs. Load when the WorkUnit contains `cards/<id>.json` files under a Metabase bundle.