Skill1.5k repo starsupdated 9d ago

sl

The sl skill provides access to ktx's semantic layer, a structured YAML catalog of warehouse sources (tables, views, overlays) enriched with business measures, computed columns, joins, and named segments. Use this skill to query pre-defined metrics like ARR, churn, retention, LTV, and MAU via `sl_query`, or to read SL source YAML files to understand the data catalog schema and available dimensions. The research agent reads and queries the semantic layer but does not modify it; capture operations are handled separately by the `sl_capture` skill.

View source Repository: ktx

Install in Claude Code

Copy

git clone --depth 1 https://github.com/Kaelio/ktx /tmp/sl && cp -r /tmp/sl/packages/cli/src/skills/sl ~/.claude/skills/sl

Then start a new Claude Code session; the skill loads automatically.

Definition

SKILL.md

# Semantic Layer

ktx's semantic layer (SL) is a structured catalog. Each **source** represents a table, a SQL view, or an overlay that enriches a manifest-backed table with measures, computed columns, joins, and named segments. The catalog is the single source of truth for reusable business metrics.

This skill covers two parts:
- **Part 1** - Schema reference (what an SL source looks like).
- **Part 2** - Querying via `sl_query`.

Capture (when and how to add new patterns to the SL) is a separate concern handled by the memory-agent - see the `sl_capture` skill if you are running in capture mode. The research agent **reads** and **queries** the SL via the tools described here; it does not write to it.

For capture-time identifier verification, load `sl_capture`. Synthesis writer
skills must verify warehouse identifiers with `discover_data`,
`entity_details`, and `sql_execution` before emitting table or column names.

---

## Part 1 - Schema reference

An SL source is a YAML file under `semantic-layer/<connectionId>/`. The file's `name:` field is the source's identity — it mirrors the warehouse identifier verbatim (e.g. Snowflake's uppercase `SIGNED_UP`); the filename is only a derived label. Always address sources by name through the `sl_*` tools, never by file path. There are three flavors:

### Overlay sources

Enrich a manifest-backed table with measures, computed columns, joins, and segments. No `table` or `sql` field. The base table's columns and grain are inherited from the manifest.

```yaml
name: fct_orders           # must match an existing manifest table
descriptions:
  user: "Overlay adding business measures to the orders fact table."
measures:
  - name: total_revenue
    expr: sum(amount)
    description: Total order revenue - filter by status or region at query time
columns:                    # computed dimensions only
  - name: is_large_order
    type: boolean
    expr: "amount > 1000"
column_overrides:           # metadata patches for inherited columns
  - name: status
    descriptions:
      user: "Order lifecycle status."
segments:
  - name: paid_non_refunded
    expr: "is_paid = true AND is_refunded = false"
joins:
  - to: customers
    on: "customer_id = customers.id"
    relationship: many_to_one
```

Rules:
- Do **not** repeat base-table columns, grain, `table`, or `source_type` in an overlay - those are inherited.
- Overlay columns MUST be computed (`expr` + `type`).
- Use `column_overrides` to add descriptions or metadata to inherited manifest columns. Do not put `type` or `expr` in `column_overrides`.
- `exclude_columns` hides specific manifest columns; `disable_joins` suppresses specific auto-detected joins.

### Standalone table sources

Self-contained; own their schema. Has `source_type: table` and `table:`.

```yaml
name: account_health_scores
source_type: table
table: "analytics.account_health_scores"
grain: [account_id, snapshot_date]
columns:
  - name: account_id
    type: string
  - name: snapshot_date
    type: time
    role: time
  - name: health_score
    type: number
measures:
  - name: avg_health_score
    expr: avg(health_score)
```

### Standalone SQL sources

Self-contained; schema derived from a SQL query. Has `source_type: sql` and `sql:`.

```yaml
name: monthly_cancellations
source_type: sql
sql: |
  SELECT
    date_trunc('month', cancelled_at) AS month,
    customer_id,
    plan_name,
    mrr_amount
  FROM subscriptions
  WHERE status = 'cancelled'
grain: [customer_id, month]
columns:
  - name: month
    type: time
    role: time
  - name: customer_id
    type: string
  - name: plan_name
    type: string
  - name: mrr_amount
    type: number
measures:
  - name: cancellation_count
    expr: count(*)
```

An SQL source is a one-shot answer: the aggregation is frozen, callers cannot re-group or re-filter by columns the SQL has collapsed, and the source is disconnected from the join graph. Prefer overlays + measures over SQL sources when possible - the `sl_capture` skill covers when SQL is justified.

### Columns

Every standalone column requires `name` and `type`. Overlays have computed columns in `columns:` and manifest column metadata patches in `column_overrides:`.

- `type`: one of `string`, `number`, `boolean`, `time`. Map LookML `date`/`datetime`/`timestamp` → `time`. Map LookML `yesno` → `boolean`.
- `role` (optional): `time` enables time-granularity queries (month, week, day). `default` is the implicit fallback.
- `visibility` (optional): `public`, `internal`, or `hidden`.
- `expr` (optional for standalone, required for overlay columns): SQL expression that computes the value. Expanded by sqlglot before generating SQL, so you can reference other columns on the same source.

### Grain

`grain: [col_a, col_b]` - the set of columns that uniquely identify one row. The query engine uses grain to prevent fanout in joins. Overlays inherit grain from the manifest unless they override.

### Joins

```yaml
joins:
  - to: customers                                    # target source name
    on: "customer_id = customers.id"                 # local_col = TARGET.target_col
    relationship: many_to_one                        # or one_to_many, one_to_one
    alias: primary_customer                          # optional - lets you join the same target twice
```

- `on` format: `local_col = TARGET.target_col`. Always qualify the right side with the target source name.
- `relationship` is the cardinality **from this source to the target**. Most joins are `many_to_one` (FK → PK on the parent).

### Measures

```yaml
measures:
  - name: total_arr
    expr: sum(arr_amount)
    description: Sum of ARR - filter by plan_name at query time
    filter: "is_active = true"
    segments: [paid_non_refunded]
```

- `name` (required, snake_case).
- `expr` (required): any valid SQL aggregate - `sum(x)`, `count(*)`, `count(distinct user_id)`, `avg(score)`.
- `description` (required on capture): what the measure computes and how to use it.
- `filter` (optional): SQL predi