Skill1.5k repo starsupdated 9d ago

sl_capture

**sl_capture** teaches when and how to add reusable patterns to the semantic layer, including business metrics, derived calculations, join patterns, and computed dimensions. Use this skill after the post-turn memory agent identifies a pattern worth preserving, ensuring SQL expressions match the warehouse dialect (BigQuery, Postgres, Redshift, or Snowflake) and that the pattern solves a genuine analytical need rather than a one-off exploration.

View source Repository: ktx

Install in Claude Code

Copy

git clone --depth 1 https://github.com/Kaelio/ktx /tmp/sl_capture && cp -r /tmp/sl_capture/packages/cli/src/skills/sl_capture ~/.claude/skills/sl_capture

Then start a new Claude Code session; the skill loads automatically.

Definition

SKILL.md

# Semantic Layer - Capture

This skill covers **when** and **how** to capture new patterns into the semantic layer. For schema reference and query grammar, load the `sl` skill first.

When the current turn produces a reusable pattern (business metric, derived view, join pattern, computed dimension), capture it so future queries can reach for it instead of rediscovering it.

## SQL dialect

The user-facing prompt includes a `Warehouse:` line under the SL Sources index
(e.g. `Warehouse: BIGQUERY`). All `expr` strings - measure expressions, segment
predicates, computed-column SQL - execute on that warehouse and must use its
syntax. Date arithmetic in particular varies by dialect:

- **BigQuery**: `transaction_date >= TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 90 DAY)` (when the column is `TIMESTAMP`); `event_date >= DATE_SUB(CURRENT_DATE(), INTERVAL 90 DAY)` (when `DATE`).
- **Postgres / Redshift**: `transaction_date >= current_date - interval '90 days'`.
- **Snowflake**: `transaction_date >= dateadd(day, -90, current_timestamp())`.

Match the column's manifest type (`type: time` → TIMESTAMP/DATETIME on the
warehouse) - comparing TIMESTAMP to a DATE-arithmetic result fails on
BigQuery. After every `sl_edit_source`/`sl_write_source`, the inline validator runs a
`LIMIT 1` warehouse probe per measure and surfaces dialect mismatches; if
you see an error trailer, fix the expression and retry rather than leaving
the source for the post-squash gate to revert.

## What's worth capturing

- Business metric aggregations (ARR, MRR, revenue, churn, retention, conversion, LTV, CAC).
- Derived calculations combining multiple signals (risk scores, health scores, composite KPIs).
- Multi-table join patterns producing a reusable analytical view.
- Computed categories or flags useful as reusable dimensions (`case when num_protocols >= 3 then 'power' else 'regular' end`).
- Missing joins between two sources that both exist but aren't connected in the join graph.

Skip:
- Simple `SELECT * LIMIT 10` previews.
- Trivial `COUNT(*)` on one table with no business filtering.
- One-off ad-hoc explorations unlikely to repeat.
- Equivalent measures that already exist (cite the existing one as `source.measure_name`).

When in doubt, capture. Measures are easy to remove but impossible to recover from a lost conversation.

## Generalization rules

The SL must stay small and general over time. Before adding a measure, decide whether it belongs as a generic pattern or a specific constant.

**Prefer one generic measure with query-time filters over N hardcoded variants.**

Anti-pattern:
```yaml
- name: revenue_us_region
  expr: sum(case when region = 'US' then amount end)
- name: revenue_eu_region
  expr: sum(case when region = 'EU' then amount end)
```

Preferred:
```yaml
- name: total_revenue
  expr: sum(amount)
```
Callers filter `region = 'US'` at query time.

**Bake constants in only when the filter has named business meaning that won't change** (`enterprise_arr` for a contractually defined tier), cannot be expressed via the source's dimensions, or comes from a regulated/fixed list.

**Time anchors and value lists belong in callers' filters, not in measure expressions or source SQL.**
- Anti-pattern (date anchor inlined): `expr: count(distinct case when transaction_date >= '2026-04-12' then customer_id end)` - the date will need editing every time the question shifts, and every reader has to discover it.
- Anti-pattern (value list inlined in source SQL): `WHERE product_category_1 IN ('Testosterone', 'Weight Loss', …)` - locks the source to today's catalog and blocks callers from broadening or narrowing.
- Preferred: a generic measure (`count(distinct customer_id)`) plus either a named segment that captures the *meaning* of the anchor (`gh_new_products_since_launch`) or a query-time filter. Callers compose; the source stays small.
- A date is durable to bake in only when it represents a regulatory cutover, a contractually fixed boundary, or a one-time event that reshapes how the source itself is read.

**If you create a segment whose expr matches a measure's filter, the measure MUST reference the segment via `segments: [segment_name]` rather than re-inlining the predicate.** This is the canonical pattern even with a single measure - duplicating the predicate inline defeats the purpose of naming it.

Anti-pattern:
```yaml
segments:
  - name: engaged_subscriber
    expr: "is_paid = true AND <date-window-90-days-on-transaction_date>"
measures:
  - name: engaged_subscriber_count
    expr: "count(distinct case when is_paid = true and transaction_date >= current_date - interval '90 day' then admin_user_id end)"
```

Preferred:
```yaml
segments:
  - name: engaged_subscriber
    expr: "is_paid = true AND <date-window-90-days-on-transaction_date>"
measures:
  - name: engaged_subscriber_count
    expr: "count(distinct admin_user_id)"
    segments: [engaged_subscriber]
```

**Use computed dimensions for derived categories.** A flag like `is_power_user` belongs on `columns[]` with `expr`, not inlined into every measure.

**Extract repeated filter bundles into named segments.** If the same predicate appears on multiple measures of the same source, lift it to a `segments[]` entry and have each measure reference it. One edit updates every measure that depends on it.

**Never write a standalone file on a manifest-backed name.** If `sl_discover({ query: "<table-or-source-name>" })` finds an existing schema for that name, you MUST write an overlay. A standalone with `sql:` or `table:` on a manifest-backed name clobbers the inherited columns and joins; `sl_write_source` and `sl_validate` both reject this shape with a clear fix hint. Always run `sl_discover` before your first write on any existing name.

Overlay before/after examples:

```yaml
# Wrong: patches an inherited manifest column through columns:
name: fct_orders
columns:
  - name: status
    descriptions:
      user: "Order lifecycle status."
```

```yaml
# Right: patch inherited