Skip to main content
ClaudeWave
Skill1.2k repo starsupdated today

live_database_ingest

The live_database_ingest skill processes raw database schema snapshots from Kaelio/ktx repositories by reading table JSON files and connection metadata to create or update semantic-layer sources. Use this skill when ingesting work units containing raw files under `raw-sources/<connectionId>/live-database/<syncId>/`, following a verification protocol that confirms identifiers exist before writing wiki pages or semantic-layer sources, preserving database comments as descriptions, and validating sources before completion.

Install in Claude Code
Copy
git clone --depth 1 https://github.com/Kaelio/ktx /tmp/live_database_ingest && cp -r /tmp/live_database_ingest/packages/cli/src/skills/live_database_ingest ~/.claude/skills/live_database_ingest
Then start a new Claude Code session; the skill loads automatically.

SKILL.md

# Live Database Ingest

Use this skill when the ingest work unit contains raw files under
`raw-sources/<connectionId>/live-database/<syncId>/`.

## Workflow

1. Read the table JSON file listed in the work unit.
2. Read `connection.json` to understand the snapshot metadata.
3. Read `foreign-keys.json` when the table has a foreign key or when joins are
   needed for the semantic-layer source.
4. Create or update one semantic-layer source for the table with
   `sl_write_source`.
5. Use the physical table name from the raw JSON as the source `table` field.
6. Preserve database comments as `descriptions.db` on tables and columns.
7. Add joins only when the foreign key index names both sides.
8. Write wiki pages only for durable business meaning that is present in table
   or column comments.
9. Run `sl_validate` for the table source before the work unit completes.

Sample values come from the scan record; do not invent values not present in
relationship-profile.json.

## Identifier Verification Protocol

Before writing a wiki page or SL source on any topic:

1. `discover_data({query: "<topic>"})` - see what wikis, SL sources, and raw
   tables already exist. Prefer updating existing pages over creating new ones.

Before emitting any `schema.table` or `schema.table.column` into a wiki body,
SL source, `tables:` frontmatter, `sl_refs`, or `emit_unmapped_fallback`:

2. `entity_details({connectionId, targets: [{display: "<identifier>"}]})` -
   confirm the identifier resolves; inspect native types, FK/PK, and
   sampleValues.
3. For literal values from the source, such as status codes or plan tiers,
   check whether they appear in `entity_details` sampleValues for the relevant
   column. If sampleValues is short or the sample may have missed real values,
   run a `sql_execution` probe with the same warehouse connection id:
   `sql_execution({connectionId, sql: "SELECT DISTINCT <col> FROM <ref> LIMIT 50"})`.
4. If the candidate identifier still does not resolve, do one of:
   - Use `sql_execution({connectionId, sql: "SELECT 1 FROM <ref> LIMIT 0"})`.
     If it errors, the identifier is fictional.
   - Wrap the identifier in `[unverified - from <rawPath>]` in the wiki body,
     citing the exact raw path that mentioned it.
   - When recording `emit_unmapped_fallback` with `no_physical_table`, include
     the failing probe error in `clarification`.
5. Never copy `<schema>.<table>` placeholder strings from these instructions
   into output.

## Source shape

For a raw table with this shape:

```json
{
  "name": "orders",
  "db": "public",
  "columns": [
    { "name": "id", "type": "integer", "nullable": false, "primaryKey": true }
  ]
}
```

Write a semantic-layer source with this shape:

```yaml
name: orders
table: public.orders
grain: id
columns:
  - name: id
    type: number
```

Use `string`, `number`, `time`, or `boolean` for column types. When a database
type is ambiguous, use `string`.

## Boundaries

The raw snapshot is structural evidence. Do not invent measures, segments,
business definitions, or joins that are not present in the snapshot files.
ktx-analyticsSkill

Use when answering a question that needs data from a ktx-connected database - investigating, analyzing, "how many", "show me", "what's the breakdown of", finding records by value, exploring tables, comparing periods, explaining metrics, or any data-analysis request. Triggers even when the user does not say "analytics"; if the answer requires querying a configured ktx connection, this skill applies.

dbt_ingestSkill

Map dbt `schema.yml` / `properties.yml` models and sources into ktx semantic-layer overlays and column notes. Covers `sources:` vs `models:`, column `data_tests` (not_null, unique, accepted_values, relationships), and how bundle-time writes complement manifest backfill from git sync. Load when the WorkUnit's `skillNames` includes `dbt_ingest` or when raw files are dbt YAML under `models/` / `sources/`.

historic_sql_patternsSkill

Identify recurring cross-table historic-SQL analytical intents from a bounded pattern shard and emit typed pattern evidence for deterministic wiki projection.

historic_sql_table_digestSkill

Convert one changed historic-SQL table usage bucket into typed table usage evidence for deterministic _schema projection.

ingest_triageSkill

Classify and resolve conflicts detected during bundle ingest (structural duplicates, definitional contradictions, near-duplicate clusters, re-ingest changes, evictions).

looker_ingestSkill

Extract durable ktx knowledge and semantic-layer contribution proposals from staged Looker runtime dashboard, Look, and explore JSON. Load for WorkUnits whose raw files are under explores/, dashboards/, or looks/.

lookml_ingestSkill

Map a LookML view/model/explore into ktx semantic layer sources. Covers the LookML to ktx primitive table, provenance tagging, and three worked examples (overlay, standalone from derived_table, standalone with sql_always_where). Load when the turn contains `.lkml` content.

metabase_ingestSkill

Convert Metabase questions, models, and metrics into ktx Semantic Layer source definitions. Covers result-metadata to KSL column type mapping, FK/PK detection, near-duplicate deduplication, pre-aggregation decomposition, join-graph connectivity, and how to react to priorProvenance from earlier ingest syncs. Load when the WorkUnit contains `cards/<id>.json` files under a Metabase bundle.