Skip to main content
ClaudeWave
Skill1.2k estrellas del repoactualizado today

metabase_ingest

metabase_ingest converts Metabase questions, models, and metrics into ktx Semantic Layer source definitions by processing card JSON files that contain resolved SQL, result metadata, and field references. Use this skill when ingesting a Metabase bundle to map Metabase base types and semantic types to KSL column types, detect primary and foreign keys, identify time-grain and join candidates, decompose pre-aggregations, and handle deduplication and provenance tracking across sync cycles.

Instalar en Claude Code
Copiar
git clone --depth 1 https://github.com/Kaelio/ktx /tmp/metabase_ingest && cp -r /tmp/metabase_ingest/packages/cli/src/skills/metabase_ingest ~/.claude/skills/metabase_ingest
Después abre una sesión nueva de Claude Code; el skill carga automáticamente.

SKILL.md

# Metabase to ktx Semantic Layer

Each WorkUnit represents one Metabase collection's cards for one Metabase database (mapped to exactly one ktx connection). Every `cards/<id>.json` file carries the resolved SQL, result_metadata, card type, collection path, and referenced-card ids. The WU's `sync-config.json` tells you which sync mode is active and which selections apply. `databases/<id>.json` tells you the target ktx connection.

## Context format

Each card JSON looks like:
```json
{
  "metabaseId": 7,
  "name": "Daily orders",
  "description": "Orders by day",
  "type": "model",
  "databaseId": 42,
  "collectionId": 5,
  "resolvedSql": "SELECT ...",
  "templateTags": [{"name": "ref", "type": "card", "cardReference": 10}],
  "resultMetadata": [
    {"name": "day", "base_type": "type/DateTime", "semantic_type": "type/CreationTimestamp"},
    {"name": "order_count", "base_type": "type/Integer"}
  ],
  "collectionPath": ["Data", "Orders Team"],
  "referencedCardIds": [10]
}
```

Use `resultMetadata` to:
- Map `base_type` to KSL column type: `type/Integer`, `type/Float`, `type/Decimal`, `type/BigInteger` → `number`; `type/Text`, `type/TextLike` → `string`; `type/DateTime`, `type/Date`, `type/DateTimeWithTZ` → `time`; `type/Boolean` → `boolean`.
- Identify grain candidates: columns with `semantic_type: type/PK`.
- Identify join candidates: columns with `semantic_type: type/FK` plus `fk_target_field_id`.
- Identify time columns: `semantic_type: type/CreationTimestamp` or `type/UpdatedTimestamp` → set `role: time`.
- Use `display_name` for measure descriptions when available.

### Additional card metadata

- `parameters`: list of card-level parameters with widget types and defaults. When SQL resolution fell back to unresolved SQL, use this to drive Step A of the SQL-translation workflow (drop optional clauses): knowing each `{{ var }}` is `type: "date/range"` vs `type: "category"` tells you what kind of clause it is.
- `resultMetadata[i].field_ref`: Metabase's canonical reference to the source warehouse field. Shape `["field", <field_id>, <options>]`. When this is set, the column maps directly to a warehouse field, which is useful for declaring joins from FK metadata without re-parsing SQL.
- `lastRunAt`: ISO timestamp of the card's last execution. If null or very old, the card may be dead; prefer skipping over creating a source.
- `dashboardCount`: number of dashboards referencing the card. Cards with `dashboardCount: 0` and a stale `lastRunAt` are strong skip signals.

Before writing a wiki page derived from a Metabase question SQL, verify each
schema.table.column mentioned with entity_details.

## Identifier Verification Protocol

Before writing a wiki page or SL source on any topic:

1. `discover_data({query: "<topic>"})` - see what wikis, SL sources, and raw
   tables already exist. Prefer updating existing pages over creating new ones.

Before emitting any `schema.table` or `schema.table.column` into a wiki body,
SL source, `tables:` frontmatter, `sl_refs`, or `emit_unmapped_fallback`:

2. `entity_details({connectionId, targets: [{display: "<identifier>"}]})` -
   confirm the identifier resolves; inspect native types, FK/PK, and
   sampleValues.
3. For literal values from the source, such as status codes or plan tiers,
   check whether they appear in `entity_details` sampleValues for the relevant
   column. If sampleValues is short or the sample may have missed real values,
   run a `sql_execution` probe with the same warehouse connection id:
   `sql_execution({connectionId, sql: "SELECT DISTINCT <col> FROM <ref> LIMIT 50"})`.
4. If the candidate identifier still does not resolve, do one of:
   - Use `sql_execution({connectionId, sql: "SELECT 1 FROM <ref> LIMIT 0"})`.
     If it errors, the identifier is fictional.
   - Wrap the identifier in `[unverified - from <rawPath>]` in the wiki body,
     citing the exact raw path that mentioned it.
   - When recording `emit_unmapped_fallback` with `no_physical_table`, include
     the failing probe error in `clarification`.
5. Never copy `<schema>.<table>` placeholder strings from these instructions
   into output.

## Decision tree

For each card:
1. Analyze `resolvedSql` + `resultMetadata`: identify base tables, aggregations, joins, filters, column types.
2. **REQUIRED before any write**: call `sl_discover` for every candidate target source name. The response tells you whether the name is manifest-backed (`Type: table` or `Type: sql`). For manifest-backed names you MUST use the overlay shape (`name:` plus overlay fields such as `measures:`, `segments:`, `descriptions:`, `joins:`, `disable_joins:`, `column_overrides:`, and computed-only `columns:` entries with `expr` + `type`; no `sql:`, `table:`, `grain:`, or base-table `columns:`); the tool will reject a standalone write and you'll have wasted the call. If `sl_discover` returns nothing for the name, you can write a standalone source. Also call `sl_read_source` on existing sources you intend to extend so you don't duplicate measures.
3. Include `rawPaths: ["cards/<id>.json"]` on every `sl_write_source`, `sl_edit_source`, and `wiki_write` call. If one artifact generalizes multiple near-duplicate cards, include each contributing card path and no unrelated cards.
4. Decide:
   - Simple aggregation on a table that already has a source → `sl_edit_source` to add a measure.
   - Join between tables that should be linked in the SL graph → `sl_edit_source` to add a join.
   - Complex derived SQL (CTEs, multi-layer aggregation, scoring models) → `sl_write_source` with `source_type: sql`. When the SQL projects/filters from a single manifest-backed base table, set `inherits_columns_from: <manifest_key>` so columns inherit type and description from the manifest - see `sl_capture` skill for the slim form. Use `sl_discover` to discover the manifest key from the table reference in the SQL (it accepts `MARTS.CONSIGNMENTS`, `ANALYTICS.MARTS.CONSIGNMENTS`, or `CONSIGNMENTS`).
   - New base table not yet in the
ktx-analyticsSkill

Use when answering a question that needs data from a ktx-connected database - investigating, analyzing, "how many", "show me", "what's the breakdown of", finding records by value, exploring tables, comparing periods, explaining metrics, or any data-analysis request. Triggers even when the user does not say "analytics"; if the answer requires querying a configured ktx connection, this skill applies.

dbt_ingestSkill

Map dbt `schema.yml` / `properties.yml` models and sources into ktx semantic-layer overlays and column notes. Covers `sources:` vs `models:`, column `data_tests` (not_null, unique, accepted_values, relationships), and how bundle-time writes complement manifest backfill from git sync. Load when the WorkUnit's `skillNames` includes `dbt_ingest` or when raw files are dbt YAML under `models/` / `sources/`.

historic_sql_patternsSkill

Identify recurring cross-table historic-SQL analytical intents from a bounded pattern shard and emit typed pattern evidence for deterministic wiki projection.

historic_sql_table_digestSkill

Convert one changed historic-SQL table usage bucket into typed table usage evidence for deterministic _schema projection.

ingest_triageSkill

Classify and resolve conflicts detected during bundle ingest (structural duplicates, definitional contradictions, near-duplicate clusters, re-ingest changes, evictions).

live_database_ingestSkill

Capture semantic-layer and knowledge updates from a live database schema snapshot.

looker_ingestSkill

Extract durable ktx knowledge and semantic-layer contribution proposals from staged Looker runtime dashboard, Look, and explore JSON. Load for WorkUnits whose raw files are under explores/, dashboards/, or looks/.

lookml_ingestSkill

Map a LookML view/model/explore into ktx semantic layer sources. Covers the LookML to ktx primitive table, provenance tagging, and three worked examples (overlay, standalone from derived_table, standalone with sql_always_where). Load when the turn contains `.lkml` content.