notion_synthesize
The notion_synthesize skill transforms staged Notion pages, databases, and clustered evidence into durable wiki entries and semantic-layer sources. Load this skill when a WorkUnit contains raw Notion files or evidence chunks, using it to synthesize related Notion content into high-quality business knowledge pages that capture metrics, workflows, ownership rules, and data-source definitions while searching for existing overlaps and supporting evidence.
git clone --depth 1 https://github.com/Kaelio/ktx /tmp/notion_synthesize && cp -r /tmp/notion_synthesize/packages/cli/src/skills/notion_synthesize ~/.claude/skills/notion_synthesizeSKILL.md
# Notion Cluster Synthesis
Use this skill when a WorkUnit contains staged Notion content from `pages/**`, `databases/**`, `data-sources/**`, or clustered Notion evidence.
## Role
Each WorkUnit is either a single Notion page/span or a topical cluster of related Notion pages, pre-grouped by embedding similarity. Read the assigned raw files, then write a small set of durable wiki entries and, when applicable, semantic-layer sources that synthesize the WorkUnit's knowledge. Write final memory directly; do not write candidates.
## Required Workflow
1. Read the WorkUnit notes and rawFiles list. Page content lives in `page.md`; `metadata.json` holds title, path, object type, data-source ids, last edited metadata, and properties.
2. For each assigned page, call `read_raw_file`, or `read_raw_span` for oversized pages when the notes specify a span.
3. Search `wiki_search` for existing pages that overlap the WorkUnit topics. Prefer updating an existing page over creating a duplicate.
4. Use `context_evidence_search`, `context_evidence_read`, and `context_evidence_neighbors` to pull supporting chunks when indexed evidence is relevant. Pass `chunkId` and `documentId` values verbatim as returned by the evidence tools.
5. Write durable business knowledge with `wiki_write`. Aim for a small number of high-quality pages per WorkUnit or cluster. Include `rawPaths` with the exact Notion raw files that support each page.
6. When the Notion content defines a reusable dataset, metric, segment, join rule, source-of-truth mapping, or table with explicit columns, load `sl_capture`, discover existing sources first with `sl_discover` or `sl_read_source`, then use `sl_write_source` or `sl_edit_source` only for a confirmed mapped non-Notion target source. Include `rawPaths` with the exact Notion raw files that support the SL action. If no mapped target exists, call `emit_unmapped_fallback` and keep the content wiki-only.
7. For every deleted raw path in the Eviction Set, call `eviction_list`, decide retention, then `emit_eviction_decision`. Do this even when no wiki write is needed.
## What To Capture
Capture durable, reusable company knowledge:
- metric definitions, KPI formulas, named business concepts, and reusable filters
- workflows, policies, ownership rules, approval conventions, and source-of-truth mappings
- data-source row pages that describe tables, columns, semantic models, dashboards, or business entities
- cross-system aliases connecting Notion terms to warehouse, dbt, Looker, Metabase, or MetricFlow names
- caveats, conflicts, supersession notes, and customer/product assumptions affecting future analysis
Skip noisy or transient content:
- meeting notes with no reusable rule
- task lists, project status updates, and time-bounded snapshots
- duplicate docs with no new fact
- database metadata pages when row pages contain the actual business content
- transient announcements and long page summaries
## Quality
Prefer fewer, stronger entries. Every wiki entry must cite at least one Notion page or row using its path and last edited date when available. When evidence conflicts, write a conflict note inside the wiki page rather than choosing silently.
If a clustered WorkUnit includes several related pages, synthesize the shared rule or concept instead of writing one thin page per source. For oversized page spans, read only the assigned span unless the WorkUnit explicitly asks for neighboring context.
Search existing wiki pages for the same `tables:` or `sl_refs:` frontmatter and for source-of-truth aliases before creating a new page. If an existing page already documents the same warehouse object or business concept, update it instead of creating a differently named duplicate.
## Citation Style
```md
## Revenue Recognition
- Booked revenue excludes refunds and test accounts.
- Source: Notion - Company Handbook / Finance / Revenue Recognition, last edited 2026-04-12.
- Conflict note: An older Sales Ops page uses gross revenue before refunds; treat the Finance Handbook as current unless Finance says otherwise.
```
## Semantic-Layer Rules
- Load `sl_capture` before writing or editing SL sources.
- Discover existing sources first with `sl_discover`; read existing source YAML before editing.
- Prefer overlays on manifest-backed sources over standalone SQL.
- If Notion describes a dashboard or metric but does not define executable logic, write a wiki page and attach `sl_refs` only after confirming the referenced source exists.
- Notion `dataSourceCount` counts Notion databases/data sources only. It does not prove that a warehouse/dbt table has or lacks a mapped semantic-layer source.
- Do not create SL sources under the Notion connection just because a page mentions a warehouse, dbt, Looker, or Metabase object. Use the mapped warehouse/source connection after discovery, or emit an unmapped fallback and write wiki-only.
- Distinguish fallback reasons precisely: if a non-Notion warehouse/dbt connection exists but `sl_discover` cannot find the named table/source, use `no_physical_table`; reserve `no_connection_mapping` for cases where there is no plausible non-Notion target connection at all.
- If `sl_discover` resolves the table/source, do not call `emit_unmapped_fallback` for that table. Use the resolved source for `sl_refs`, overlay edits, or wiki-only documentation.
- When calling `emit_unmapped_fallback`, pass the table or source identifier as `tableRef` (e.g. `tableRef: "<schema>.<table>"`) - the tool generates the canonical detail string from the reason code and `tableRef`. Use the optional `clarification` field only to add context that does not contradict the reason. Do not restate the reason in `clarification`.
## Identifier Verification Protocol
Before writing a wiki page or SL source on any topic:
1. `discover_data({query: "<topic>"})` - see what wikis, SL sources, and raw
tables already exist. Prefer updating existing pages over creating new ones.
Before emitting any `schema.table`Use when answering a question that needs data from a ktx-connected database - investigating, analyzing, "how many", "show me", "what's the breakdown of", finding records by value, exploring tables, comparing periods, explaining metrics, or any data-analysis request. Triggers even when the user does not say "analytics"; if the answer requires querying a configured ktx connection, this skill applies.
Map dbt `schema.yml` / `properties.yml` models and sources into ktx semantic-layer overlays and column notes. Covers `sources:` vs `models:`, column `data_tests` (not_null, unique, accepted_values, relationships), and how bundle-time writes complement manifest backfill from git sync. Load when the WorkUnit's `skillNames` includes `dbt_ingest` or when raw files are dbt YAML under `models/` / `sources/`.
Identify recurring cross-table historic-SQL analytical intents from a bounded pattern shard and emit typed pattern evidence for deterministic wiki projection.
Convert one changed historic-SQL table usage bucket into typed table usage evidence for deterministic _schema projection.
Classify and resolve conflicts detected during bundle ingest (structural duplicates, definitional contradictions, near-duplicate clusters, re-ingest changes, evictions).
Capture semantic-layer and knowledge updates from a live database schema snapshot.
Extract durable ktx knowledge and semantic-layer contribution proposals from staged Looker runtime dashboard, Look, and explore JSON. Load for WorkUnits whose raw files are under explores/, dashboards/, or looks/.
Map a LookML view/model/explore into ktx semantic layer sources. Covers the LookML to ktx primitive table, provenance tagging, and three worked examples (overlay, standalone from derived_table, standalone with sql_always_where). Load when the turn contains `.lkml` content.