Skill16.7k repo starsupdated today

generate-mdl

The generate-mdl Claude Code skill automates the creation of a Wren MDL (Modeling Definition Language) project by discovering a database schema and converting it into YAML format. Use it when setting up a new MDL project, onboarding a fresh data source, or scaffolding a project structure from an existing database using available tools like SQLAlchemy, database drivers, or the wren CLI.

View source Repository: WrenAI

Install in Claude Code

Copy

git clone --depth 1 https://github.com/Canner/WrenAI /tmp/generate-mdl && cp -r /tmp/generate-mdl/core/wren/src/wren/skills_content/generate-mdl ~/.claude/skills/generate-mdl

Then start a new Claude Code session; the skill loads automatically.

Definition

SKILL.md

# Generate Wren MDL — CLI Agent Workflow

Builds an MDL project by discovering database schema and converting it
into Wren's YAML project format. The agent uses whatever database tools
are available in its environment for introspection; the wren CLI handles
type normalization, validation, and build.

For memory and query workflows after setup, run `wren skills get usage`.

## Prerequisites

- `wren` CLI installed (`pip install "wrenai[<datasource>]"`)
- A working database connection (credentials available to the agent)
- A connection profile (set up via `wren profile add`) or connection info ready

## Phase 0 — Detect existing project

**Goal:** If the current directory is already inside a wren project, let the user decide how to proceed.

Check whether `wren_project.yml` exists in the current working directory
(or any parent up to the repository root). If found:

1. Tell the user that an existing wren project was detected and show its path.
2. Ask:
   - **Reset** — wipe the existing project (`models/`, `views/`,
     `relationships.yml`, `instructions.md`, and rebuild `wren_project.yml`)
     and regenerate from scratch in the same directory.
   - **New path** — keep the existing project untouched and choose a
     different directory for the new project. Ask the user for the new path,
     then `wren context init --path <new_path>` and continue from Phase 1
     using that path.

If no existing project is detected, proceed directly to Phase 1.

## Phase 1 — Establish connection and scope

**Goal:** Confirm the agent can reach the database and agree on scope with the user.

1. Verify connectivity using whichever tool is available:
   - If SQLAlchemy: `engine.connect()` test
   - If database driver: simple query like `SELECT 1`
   - If wren profile exists: `wren profile debug` to check config
   - If raw SQL via wren: `wren --sql "SELECT 1"` (requires profile or connection file)

2. Ask the user:
   - Which **schema(s)** or **dataset(s)** to include (skip if only one exists)
   - Whether to include **all tables** or a subset
   - The **datasource type** for wren (e.g., `postgres` (including Aurora), `mysql` (including Aurora), `bigquery`, `snowflake`) — needed for type normalization dialect

## Phase 2 — Discover schema

**Goal:** Collect table names, column names, column types, and constraints.

Use whatever introspection method is available. Here are common approaches
ranked by convenience:

### Option A: SQLAlchemy (recommended if available)

```python
from sqlalchemy import create_engine, inspect

engine = create_engine(connection_url)
inspector = inspect(engine)

tables = inspector.get_table_names(schema="public")

for table in tables:
    columns = inspector.get_columns(table, schema="public")
    # columns → [{"name": "id", "type": INTEGER(), "nullable": False, ...}]

    pk = inspector.get_pk_constraint(table, schema="public")
    # pk → {"constrained_columns": ["id"], "name": "orders_pkey"}

    fks = inspector.get_foreign_keys(table, schema="public")
    # fks → [{"constrained_columns": ["customer_id"],
    #          "referred_table": "customers",
    #          "referred_columns": ["id"]}]
```

### Option B: Database-specific driver

- **psycopg / asyncpg (Postgres):** Query `information_schema.columns` and `information_schema.table_constraints`
- **google-cloud-bigquery:** `client.list_tables()`, `client.get_table()` → `table.schema`
- **snowflake-connector-python:** `SHOW COLUMNS IN TABLE`, `SHOW PRIMARY KEYS IN TABLE`
- **clickhouse-driver:** `DESCRIBE TABLE`, `system.tables`

### Option C: Raw SQL via wren

If no driver is available but a wren profile is configured, query
`information_schema` through wren itself:

```bash
wren --sql "SELECT table_name FROM information_schema.tables WHERE table_schema = 'public'" -o json
wren --sql "SELECT column_name, data_type FROM information_schema.columns WHERE table_name = 'orders'" -o json
```

Note: this goes through the MDL layer, so it only works if you already
have a minimal MDL or if the database supports `information_schema` as
regular tables. For bootstrapping from zero, Option A or B is preferred.

## Phase 3 — Normalize types

**Goal:** Convert raw database types to wren-core-compatible types.

### Python import (recommended for batch processing)

```python
from wren.type_mapping import parse_type, parse_types

# Single type
normalized = parse_type("character varying(255)", "postgres")  # → "VARCHAR(255)"

# Batch — entire table at once
columns = [
    {"column": "id", "raw_type": "int8"},
    {"column": "name", "raw_type": "character varying"},
    {"column": "total", "raw_type": "numeric(10,2)"},
]
normalized_cols = parse_types(columns, dialect="postgres")
# Each dict now has a "type" key with the normalized value
```

### CLI (if Python import not available)

Single type:
```bash
wren utils parse-type --type "character varying(255)" --dialect postgres
# → VARCHAR(255)
```

Batch (stdin JSON):
```bash
echo '[{"column":"id","raw_type":"int8"},{"column":"name","raw_type":"character varying"}]' \
  | wren utils parse-types --dialect postgres
```

## Phase 4 — Scaffold and write MDL project

**Goal:** Create the YAML project structure.

### Step 1 — Initialize project

```bash
wren context init --path /path/to/project
```

This creates:
```text
project/
├── wren_project.yml
├── models/              # business-facing tables/models
├── views/               # named SQL statements
├── cubes/               # pre-aggregation cubes (measures + dimensions)
├── relationships.yml
└── instructions.md
```

> **When to define cubes:** If the user asks aggregation questions like
> "revenue by month" or "top customers", define cubes alongside models —
> they give agents a structured query API instead of forcing them to
> hand-write `GROUP BY` / `DATE_TRUNC` SQL. See the
> [Cube guide](https://github.com/Canner/WrenAI/blob/main/docs/core/guides/cubes.md).

> **IMPORTANT: `catalog` and `schema` in `wren_project.yml`**
>
> These are Wren Engi

More from this repository

dlt-connectorSkill

Connect SaaS data (HubSpot, Stripe, Salesforce, GitHub, Slack, etc.) to Wren Engine for SQL analysis. Guides the user through the full flow: install dlt, pick a SaaS source, set up credentials, run the data pipeline into DuckDB, then auto-generate a Wren semantic project from the loaded data. Use this skill whenever the user mentions: connecting SaaS data, importing data from an API, dlt pipelines, loading HubSpot/Stripe/Salesforce/GitHub/Slack data, querying SaaS data with SQL, or setting up a new data source from a REST API. Also trigger when the user already has a dlt-produced DuckDB file and wants to create a Wren project from it.

enrich-contextSkill

Augment a Wren project with business context that DB schema cannot carry — enum value meanings, units (USD vs cents, ms vs sec), NULL semantics, magic sentinels (-1 = unknown), soft-delete default filters, business synonyms, time-grain / TZ conventions, cross-system identifiers, currency rules, canonical-table preferences, AND named aggregation metrics (ARR, churn, DAU, WAU, NRR) proposed as cubes. Runs in one of two modes selected at session start: `grill` (one question at a time, user-driven) or `auto-pilot` (agent infers and applies, escalates only on conflicts and high-blast-radius additions like new cubes / views / relationships). Reads everything under <project>/raw/ (PDFs, glossaries, handbooks, code, data dictionaries) and optionally samples low-cardinality columns from the live DB (grill mode), compares against the current MDL / cubes / instructions.md / queries.yml / memory pairs, then fills gaps via the ten-category gap catalog and the cube proposal flow. Confirmed findings are written back to the right sink. Use when: user says 'enrich context', 'augment my project', 'grill me on this project', 'auto-fill my context', 'agent doesn't understand our docs / enum values / units / null meanings', 'business context is missing', 'what does status=A mean', 'is this amount in USD or cents', 'we keep getting wrong aggregations', 'add cubes for ARR / DAU / churn', 'we have a handbook / glossary / data dictionary the agent should know'; or after generating an MDL and noticing the agent lacks business semantics.

onboardingSkill

Onboard a user to Wren Engine end-to-end. Walks through environment checks, project scaffolding, connection configuration via .env, and first query. Use when: user wants to install Wren Engine, set up a new data source connection, or bootstrap a new project from scratch. Triggers: '/wren-onboarding', 'install wren', 'set up wren engine', 'wren onboarding', 'connect new database to wren'.

usageSkill

Wren Engine CLI workflow guide for AI agents. Answer data questions end-to-end using the wren CLI: gather schema context, recall past queries, write SQL through the MDL semantic layer, execute, and learn from confirmed results. Use when: user asks a data question, requests a report or analysis, asks about metrics, revenue, customers, orders, trends, or any business data; user says 'how many', 'show me', 'what is the', 'top N', 'compare', 'trend', 'growth', 'breakdown'; user wants to explore, analyze, filter, aggregate, or summarize data from a database; agent needs to query data, connect a data source, handle errors, or manage MDL changes via the wren CLI.

wrenSkill

Wren CLI for AI agents — a semantic SQL layer over 22+ databases (Postgres, MySQL, BigQuery, Snowflake, Spark, …). The actual workflow guides live inside the `wren` CLI itself; this is just a discovery stub. Use whenever the user asks a data question (how many, show me, top N, compare, trend, breakdown, metric, revenue, customers, orders), wants to install / set up Wren Engine, connect a new database, connect SaaS data via dlt (HubSpot, Stripe, Salesforce, GitHub, Slack), generate or regenerate an MDL project from a database schema, enrich a project with business context (enum meanings, units, cubes like ARR / DAU / churn), or turn a project's context layer into a shareable GenBI web app / dashboard and deploy it to Vercel or Cloudflare. Triggers: 'install wren', 'set up wren engine', 'connect database to wren', 'connect SaaS to wren', 'load hubspot / stripe / salesforce data', 'generate mdl', 'scaffold wren project', 'enrich wren context', 'augment my project', 'add cubes', 'build a dashboard', 'make a shareable analytics app', 'deploy my context layer as a web app', 'genbi app', 'wren onboarding', 'wren usage', 'wren generate mdl', 'wren dlt connector', 'wren enrich context', 'wren genbi'.

genbiSkill

Turn a Wren project's context layer into a shareable, browser-side GenBI web app and deploy it to the user's Vercel or Cloudflare account. Orchestrates the full flow: `wren genbi build` returns a project-hydrated build instruction, the agent authors the app from scratch into apps/<name>/, then register → verify → deploy produce a shareable URL. Use this skill whenever the user wants to: build a dashboard from their Wren project, make a shareable analytics app, deploy their context layer as a web app, host a GenBI app on Vercel or Cloudflare Pages, or asks for a 'genbi app'.