Skill1.5k repo starsupdated 9d ago

looker_ingest

The looker_ingest skill extracts metric formulas, segment definitions, field semantics, and domain conventions from Looker dashboards, Looks, and explores, converting runtime API data into durable semantic-layer knowledge and wiki candidates. Use this when processing Looker WorkUnits to translate dashboard queries and explore definitions into generalized, reusable knowledge while verifying all identifiers against actual schema and merging findings with existing memory rather than duplicating content.

View source Repository: ktx

Install in Claude Code

Copy

git clone --depth 1 https://github.com/Kaelio/ktx /tmp/looker_ingest && cp -r /tmp/looker_ingest/packages/cli/src/skills/looker_ingest ~/.claude/skills/looker_ingest

Then start a new Claude Code session; the skill loads automatically.

Definition

SKILL.md

# Looker Runtime Ingest

Looker runtime ingest turns API-staged dashboards, Looks, and explores into durable ktx memory. Runtime entities are evidence. They are not themselves the final knowledge shape.

## Required Workflow

1. Read every `rawFiles` entry for the WorkUnit.
2. Read relevant `dependencyPaths` before making a decision. For dashboard and Look WUs this usually includes the referenced explore JSON, signal files, `folders/tree.json`, and `users/<id>.json`.
3. Treat `signals/*.json`, owners, folders, schedules, and favorites as prioritization or provenance context only.
4. Extract generalizable metric formulas, segment definitions, field semantics, and domain conventions.
5. Use `wiki_search`, `sl_discover`, and `sl_read_source` before writing so new content merges with existing memory instead of duplicating it.
6. Use `context_evidence_search` or `context_evidence_read` to obtain evidence chunk IDs for any wiki-bound knowledge candidate.
7. Use `context_candidate_write` for durable wiki-bound knowledge. Do not call `wiki_write` from a Looker WorkUnit; Stage 4 reconciliation promotes candidates and writes wiki pages.
8. Use `looker_query_to_sl` for each Look query or dashboard tile query that has a `query` object.
9. Write SL from Looker runtime evidence only through the staged warehouse target contract. For explores and inherited dashboard/Look queries, branch on `targetTable.ok`; when it is true, write on `targetWarehouseConnectionId` and use `targetTable.canonicalTable` as `source.table`. When it is false or missing, write wiki knowledge candidates and record `emit_unmapped_fallback` with the staged reason.
10. Run `sl_validate` after every SL write. If validation fails, fix the source or roll it back before the WorkUnit ends.

For every Looker field reference, call entity_details on the underlying
schema.table.column before promoting it to sl_refs or quoting it in wiki body.

## Identifier Verification Protocol

Before writing a wiki page or SL source on any topic:

1. `discover_data({query: "<topic>"})` - see what wikis, SL sources, and raw
tables already exist. Prefer updating existing pages over creating new ones.

Before emitting any `schema.table` or `schema.table.column` into a wiki body,
SL source, `tables:` frontmatter, `sl_refs`, or `emit_unmapped_fallback`:

2. `entity_details({connectionId, targets: [{display: "<identifier>"}]})` -
confirm the identifier resolves; inspect native types, FK/PK, and
sampleValues.
3. For literal values from the source, such as status codes or plan tiers,
check whether they appear in `entity_details` sampleValues for the relevant
column. If sampleValues is short or the sample may have missed real values,
run a `sql_execution` probe with the same warehouse connection id:
`sql_execution({connectionId, sql: "SELECT DISTINCT <col> FROM <ref> LIMIT 50"})`.
4. If the candidate identifier still does not resolve, do one of:
- Use `sql_execution({connectionId, sql: "SELECT 1 FROM <ref> LIMIT 0"})`.
If it errors, the identifier is fictional.
- Wrap the identifier in `[unverified - from <rawPath>]` in the wiki body,
citing the exact raw path that mentioned it.
- When recording `emit_unmapped_fallback` with `no_physical_table`, include
the failing probe error in `clarification`.
5. Never copy `<schema>.<table>` placeholder strings from these instructions
into output.

## Explore WorkUnits

Explore WUs have raw files like `explores/<model>/<explore>.json` and usually depend on `lookml_models.json`.

Use the deterministic API-derived source key:

```text
looker__<model>__<explore>
```

For example, `modelName: "b2b"` and `exploreName: "sales_pipeline"` map to `looker__b2b__sales_pipeline`.

Mapped explore write shape:

```json
{
"connectionId": "22222222-2222-4222-8222-222222222222",
"sourceName": "looker__b2b__sales_pipeline",
"source": {
"name": "looker__b2b__sales_pipeline",
"table": "proj.dataset.opportunities",
"grain": ["opportunity_id"],
"columns": [
{
"name": "opportunity_id",
"type": "string"
},
{
"name": "arr",
"type": "number"
}
],
"measures": [
{
"name": "total_arr",
"expr": "sum(arr)"
}
]
}
}
```

Every concrete value in that example must be backed by raw Looker field SQL, `source_tables` preflight, `source_columns`, or existing SL when applied to a real WorkUnit. If the evidence is not present, write wiki candidates and emit `emit_unmapped_fallback`.

The staged explore file carries warehouse target fields populated before the WU starts:

- `connectionName`: the Looker runtime connection name.
- `targetWarehouseConnectionId`: the resolved warehouse connection id, or `null` when the Looker connection is unmapped.
- `rawSqlTableName`: Looker's verbatim `sql_table_name`. Keep it as provenance only.
- `targetTable`: the parsed target-table union. Use this as the sole branch condition.

When `targetTable.ok === true`, the explore has a complete ktx backing target. Before writing:

1. Use `targetTable.catalog`, `targetTable.schema`, and `targetTable.name` for `source_tables` preflight matching through `sl_discover` or `sl_read_source`.
2. Use Looker field `sql`, labels, descriptions, and type metadata to derive source columns, measures, segments, joins, and grain.
3. Call `sl_write_source` or `sl_edit_source` with `connectionId: targetWarehouseConnectionId` and `rawPaths` set to the staged explore path.
4. Set `source.name` to the deterministic API-derived source key, for example `looker__b2b__sales_pipeline`.
5. Set `source.table` to `targetTable.canonicalTable`.
6. Run `sl_validate` after every SL write.

The `table` field is `targetTable.canonicalTable`, not `rawSqlTableName`. Raw Looker values can contain aliases such as `schema.table AS x`, Looker templates such as `${TABLE}`, or derived-table SQL. Those raw forms do not compose safely with SL generation. `targetTable.c