Skill1.5k estrellas del repoactualizado 9d ago

metricflow_ingest

The metricflow_ingest skill translates MetricFlow semantic models and metrics into ktx semantic layer sources. Use it when processing dbt files containing top-level `semantic_models:` or `metrics:` blocks to map MetricFlow entities (measures, dimensions, entities) to ktx primitives, flatten inheritance via `extends:`, resolve model references, and handle metric types including simple, derived, ratio, cumulative, and conversion metrics.

Ver fuente Repositorio: ktx

Instalar en Claude Code

Copiar

git clone --depth 1 https://github.com/Kaelio/ktx /tmp/metricflow_ingest && cp -r /tmp/metricflow_ingest/packages/cli/src/skills/metricflow_ingest ~/.claude/skills/metricflow_ingest

Después abre una sesión nueva de Claude Code; el skill carga automáticamente.

Definición

SKILL.md

# MetricFlow to ktx Semantic Layer

A MetricFlow `semantic_model` maps to an SL source; MetricFlow `measures` map to ktx measures; MetricFlow `entities` map to ktx `joins`; MetricFlow `metrics` (top-level) map to ktx measures OR to cross-model derived measures. Files in one WorkUnit are ALWAYS part of the same logical entity (a connected component, possibly spanning `extends:` + cross-model metric refs). Flatten inheritance and cross-file references at write time.

## Mapping table

| MetricFlow | ktx form | Notes |
|---|---|---|
| `semantic_model: X { model: ref('t') }` with measures + dimensions | **Overlay** named `X` with `measures`, computed-only `columns`, `column_overrides`, `joins` | The `model:` ref resolves to a manifest table. |
| `semantic_model: X { model: source('s','t') }` | **Overlay** named `X` over table `t`. | Same shape; `source()` still resolves to a physical table. |
| `semantic_model: X { model: <literal> }` with no manifest entry | **Standalone** with explicit `sql:`, `grain:`, `columns:` | Happens when the dbt manifest isn't available. |
| `semantic_model: Y { extends: X }` | **Merge** Y's measures/dimensions/entities into X's overlay, or write a single overlay named for the most-derived child (Y) containing both X's and Y's primitives | Do not emit a second overlay for X - flatten. |
| `measures: [{ name, agg, expr }]` | `measures: [{ name, expr: "<agg>(<expr>)" }]` | Aggregation inlined. `agg: count_distinct` → `count(distinct ...)`. |
| `entities: [{ name, type: primary }]` | `grain: [<entity_name-or-expr>]` on the overlay/standalone | Primary/unique entities drive grain. |
| `entities: [{ name, type: foreign }]` | `joins:` entry joining to the primary-entity's semantic_model | Only when a matching primary is discoverable. |
| `metrics: [{ type: simple, type_params: { measure: X } }]` | If the base measure is labeled/described by the metric: in-place edit to the existing measure. Otherwise leave as-is. | Same-name metrics can absorb metadata. |
| `metrics: [{ type: simple, filter: <jinja> }]` | **New measure** on the same source, with the filter translated to SQL and attached via `filter:` | Translate Jinja `{{ Dimension('x__y') }}` to the column name `y`. |
| `metrics: [{ type: derived, type_params: { expr, metrics } }]` | **Derived measure** on whichever source owns the referenced measures, with `expr:` referencing measure names | If the metric spans models, still write it once on the source owning the "primary" measure (the one the agent judges most central). Mention the cross-model chain in the description. |
| `metrics: [{ type: ratio, type_params: { numerator, denominator } }]` | Same as derived; `expr: "numerator / NULLIF(denominator, 0)"` if no explicit expr | Safe-division by default. |
| `metrics: [{ type: cumulative, type_params: { window, grain_to_date } }]` | **Standalone** source with a window-function SQL; reference the resulting column as a normal measure | ktx SL has no first-class cumulative primitive (spec Non-goals). |
| `metrics: [{ type: conversion }]` | **Flag for human** - do NOT write. Emit a wiki note describing the intended semantics. | No ktx equivalent in v1. |
| Metric not mappable | Wiki page `<metric_name>-definition.md` with the full YAML body quoted | Capture the intent even if we can't emit SL. |

Type map: MetricFlow `time` to ktx `time`; `categorical` to `string`; `number` to `number`; `boolean` to `boolean`. Follow `expr` over `name` when both differ - `expr` is the physical column.

Verify each MetricFlow model source table with entity_details before producing
the corresponding sl_write_source.

## Identifier Verification Protocol

Before writing a wiki page or SL source on any topic:

1. `discover_data({query: "<topic>"})` - see what wikis, SL sources, and raw
   tables already exist. Prefer updating existing pages over creating new ones.

Before emitting any `schema.table` or `schema.table.column` into a wiki body,
SL source, `tables:` frontmatter, `sl_refs`, or `emit_unmapped_fallback`:

2. `entity_details({connectionId, targets: [{display: "<identifier>"}]})` -
   confirm the identifier resolves; inspect native types, FK/PK, and
   sampleValues.
3. For literal values from the source, such as status codes or plan tiers,
   check whether they appear in `entity_details` sampleValues for the relevant
   column. If sampleValues is short or the sample may have missed real values,
   run a `sql_execution` probe with the same warehouse connection id:
   `sql_execution({connectionId, sql: "SELECT DISTINCT <col> FROM <ref> LIMIT 50"})`.
4. If the candidate identifier still does not resolve, do one of:
   - Use `sql_execution({connectionId, sql: "SELECT 1 FROM <ref> LIMIT 0"})`.
     If it errors, the identifier is fictional.
   - Wrap the identifier in `[unverified - from <rawPath>]` in the wiki body,
     citing the exact raw path that mentioned it.
   - When recording `emit_unmapped_fallback` with `no_physical_table`, include
     the failing probe error in `clarification`.
5. Never copy `<schema>.<table>` placeholder strings from these instructions
   into output.

## Flattening `extends:`

Within one WorkUnit, multiple semantic_models linked by `extends:` are guaranteed to be present (the chunker groups them). Resolve inheritance **before** writing:

1. Start with the most-derived child (the one that no other semantic_model extends).
2. Walk the `extends:` chain upward, accumulating measures, dimensions, entities.
3. Write ONE overlay/standalone, named for the most-derived child's SL-appropriate name (not the base).
4. Parents that lack their own distinctive content should NOT get a separate overlay. If a parent has unique measures a child doesn't inherit, consider whether the base is used elsewhere - if yes, write both; if no, still one overlay.
5. Measure/dimension name collisions: child wins, but note the overridden parent in the overlay's description or in a sibling wiki page.

The spec's worked example has `orders`, `

Del mismo repositorio

ktx-analyticsSkill

Use when answering a question that needs data from a ktx-connected database - investigating, analyzing, "how many", "show me", "what's the breakdown of", finding records by value, exploring tables, comparing periods, explaining metrics, or any data-analysis request. Triggers even when the user does not say "analytics"; if the answer requires querying a configured ktx connection, this skill applies.

dbt_ingestSkill

Map dbt `schema.yml` / `properties.yml` models and sources into ktx semantic-layer overlays and column notes. Covers `sources:` vs `models:`, column `data_tests` (not_null, unique, accepted_values, relationships), and how bundle-time writes complement manifest backfill from git sync. Load when the WorkUnit's `skillNames` includes `dbt_ingest` or when raw files are dbt YAML under `models/` / `sources/`.

historic_sql_patternsSkill

Identify recurring cross-table historic-SQL analytical intents from a bounded pattern shard and emit typed pattern evidence for deterministic wiki projection.

historic_sql_table_digestSkill

Convert one changed historic-SQL table usage bucket into typed table usage evidence for deterministic _schema projection.

ingest_triageSkill

Classify and resolve conflicts detected during bundle ingest (structural duplicates, definitional contradictions, near-duplicate clusters, re-ingest changes, evictions).

live_database_ingestSkill

Capture semantic-layer and knowledge updates from a live database schema snapshot.

looker_ingestSkill

Extract durable ktx knowledge and semantic-layer contribution proposals from staged Looker runtime dashboard, Look, and explore JSON. Load for WorkUnits whose raw files are under explores/, dashboards/, or looks/.

lookml_ingestSkill

Map a LookML view/model/explore into ktx semantic layer sources. Covers the LookML to ktx primitive table, provenance tagging, and three worked examples (overlay, standalone from derived_table, standalone with sql_always_where). Load when the turn contains `.lkml` content.