looker_ingest
The looker_ingest skill extracts metric formulas, segment definitions, field semantics, and domain conventions from Looker dashboards, Looks, and explores, converting runtime API data into durable semantic-layer knowledge and wiki candidates. Use this when processing Looker WorkUnits to translate dashboard queries and explore definitions into generalized, reusable knowledge while verifying all identifiers against actual schema and merging findings with existing memory rather than duplicating content.
git clone --depth 1 https://github.com/Kaelio/ktx /tmp/looker_ingest && cp -r /tmp/looker_ingest/packages/cli/src/skills/looker_ingest ~/.claude/skills/looker_ingestSKILL.md
# Looker Runtime Ingest
Looker runtime ingest turns API-staged dashboards, Looks, and explores into durable ktx memory. Runtime entities are evidence. They are not themselves the final knowledge shape.
## Required Workflow
1. Read every `rawFiles` entry for the WorkUnit.
2. Read relevant `dependencyPaths` before making a decision. For dashboard and Look WUs this usually includes the referenced explore JSON, signal files, `folders/tree.json`, and `users/<id>.json`.
3. Treat `signals/*.json`, owners, folders, schedules, and favorites as prioritization or provenance context only.
4. Extract generalizable metric formulas, segment definitions, field semantics, and domain conventions.
5. Use `wiki_search`, `sl_discover`, and `sl_read_source` before writing so new content merges with existing memory instead of duplicating it.
6. Use `context_evidence_search` or `context_evidence_read` to obtain evidence chunk IDs for any wiki-bound knowledge candidate.
7. Use `context_candidate_write` for durable wiki-bound knowledge. Do not call `wiki_write` from a Looker WorkUnit; Stage 4 reconciliation promotes candidates and writes wiki pages.
8. Use `looker_query_to_sl` for each Look query or dashboard tile query that has a `query` object.
9. Write SL from Looker runtime evidence only through the staged warehouse target contract. For explores and inherited dashboard/Look queries, branch on `targetTable.ok`; when it is true, write on `targetWarehouseConnectionId` and use `targetTable.canonicalTable` as `source.table`. When it is false or missing, write wiki knowledge candidates and record `emit_unmapped_fallback` with the staged reason.
10. Run `sl_validate` after every SL write. If validation fails, fix the source or roll it back before the WorkUnit ends.
For every Looker field reference, call entity_details on the underlying
schema.table.column before promoting it to sl_refs or quoting it in wiki body.
## Identifier Verification Protocol
Before writing a wiki page or SL source on any topic:
1. `discover_data({query: "<topic>"})` - see what wikis, SL sources, and raw
tables already exist. Prefer updating existing pages over creating new ones.
Before emitting any `schema.table` or `schema.table.column` into a wiki body,
SL source, `tables:` frontmatter, `sl_refs`, or `emit_unmapped_fallback`:
2. `entity_details({connectionId, targets: [{display: "<identifier>"}]})` -
confirm the identifier resolves; inspect native types, FK/PK, and
sampleValues.
3. For literal values from the source, such as status codes or plan tiers,
check whether they appear in `entity_details` sampleValues for the relevant
column. If sampleValues is short or the sample may have missed real values,
run a `sql_execution` probe with the same warehouse connection id:
`sql_execution({connectionId, sql: "SELECT DISTINCT <col> FROM <ref> LIMIT 50"})`.
4. If the candidate identifier still does not resolve, do one of:
- Use `sql_execution({connectionId, sql: "SELECT 1 FROM <ref> LIMIT 0"})`.
If it errors, the identifier is fictional.
- Wrap the identifier in `[unverified - from <rawPath>]` in the wiki body,
citing the exact raw path that mentioned it.
- When recording `emit_unmapped_fallback` with `no_physical_table`, include
the failing probe error in `clarification`.
5. Never copy `<schema>.<table>` placeholder strings from these instructions
into output.
## Explore WorkUnits
Explore WUs have raw files like `explores/<model>/<explore>.json` and usually depend on `lookml_models.json`.
Use the deterministic API-derived source key:
```text
looker__<model>__<explore>
```
For example, `modelName: "b2b"` and `exploreName: "sales_pipeline"` map to `looker__b2b__sales_pipeline`.
Mapped explore write shape:
```json
{
"connectionId": "22222222-2222-4222-8222-222222222222",
"sourceName": "looker__b2b__sales_pipeline",
"source": {
"name": "looker__b2b__sales_pipeline",
"table": "proj.dataset.opportunities",
"grain": ["opportunity_id"],
"columns": [
{
"name": "opportunity_id",
"type": "string"
},
{
"name": "arr",
"type": "number"
}
],
"measures": [
{
"name": "total_arr",
"expr": "sum(arr)"
}
]
}
}
```
Every concrete value in that example must be backed by raw Looker field SQL, `source_tables` preflight, `source_columns`, or existing SL when applied to a real WorkUnit. If the evidence is not present, write wiki candidates and emit `emit_unmapped_fallback`.
The staged explore file carries warehouse target fields populated before the WU starts:
- `connectionName`: the Looker runtime connection name.
- `targetWarehouseConnectionId`: the resolved warehouse connection id, or `null` when the Looker connection is unmapped.
- `rawSqlTableName`: Looker's verbatim `sql_table_name`. Keep it as provenance only.
- `targetTable`: the parsed target-table union. Use this as the sole branch condition.
When `targetTable.ok === true`, the explore has a complete ktx backing target. Before writing:
1. Use `targetTable.catalog`, `targetTable.schema`, and `targetTable.name` for `source_tables` preflight matching through `sl_discover` or `sl_read_source`.
2. Use Looker field `sql`, labels, descriptions, and type metadata to derive source columns, measures, segments, joins, and grain.
3. Call `sl_write_source` or `sl_edit_source` with `connectionId: targetWarehouseConnectionId` and `rawPaths` set to the staged explore path.
4. Set `source.name` to the deterministic API-derived source key, for example `looker__b2b__sales_pipeline`.
5. Set `source.table` to `targetTable.canonicalTable`.
6. Run `sl_validate` after every SL write.
The `table` field is `targetTable.canonicalTable`, not `rawSqlTableName`. Raw Looker values can contain aliases such as `schema.table AS x`, Looker templates such as `${TABLE}`, or derived-table SQL. Those raw forms do not compose safely with SL generation. `targetTable.cUse when answering a question that needs data from a ktx-connected database - investigating, analyzing, "how many", "show me", "what's the breakdown of", finding records by value, exploring tables, comparing periods, explaining metrics, or any data-analysis request. Triggers even when the user does not say "analytics"; if the answer requires querying a configured ktx connection, this skill applies.
Map dbt `schema.yml` / `properties.yml` models and sources into ktx semantic-layer overlays and column notes. Covers `sources:` vs `models:`, column `data_tests` (not_null, unique, accepted_values, relationships), and how bundle-time writes complement manifest backfill from git sync. Load when the WorkUnit's `skillNames` includes `dbt_ingest` or when raw files are dbt YAML under `models/` / `sources/`.
Identify recurring cross-table historic-SQL analytical intents from a bounded pattern shard and emit typed pattern evidence for deterministic wiki projection.
Convert one changed historic-SQL table usage bucket into typed table usage evidence for deterministic _schema projection.
Classify and resolve conflicts detected during bundle ingest (structural duplicates, definitional contradictions, near-duplicate clusters, re-ingest changes, evictions).
Capture semantic-layer and knowledge updates from a live database schema snapshot.
Map a LookML view/model/explore into ktx semantic layer sources. Covers the LookML to ktx primitive table, provenance tagging, and three worked examples (overlay, standalone from derived_table, standalone with sql_always_where). Load when the turn contains `.lkml` content.
Convert Metabase questions, models, and metrics into ktx Semantic Layer source definitions. Covers result-metadata to KSL column type mapping, FK/PK detection, near-duplicate deduplication, pre-aggregation decomposition, join-graph connectivity, and how to react to priorProvenance from earlier ingest syncs. Load when the WorkUnit contains `cards/<id>.json` files under a Metabase bundle.