Skill16.7k repo starsupdated today

dlt-connector

The dlt-connector skill automates extraction and modeling of SaaS data into Wren Engine for SQL analysis. It guides users through installing dlt, selecting a SaaS source (HubSpot, Stripe, Salesforce, GitHub, Slack, etc.), configuring credentials, loading data into DuckDB, and auto-generating a semantic project. Use it when users want to connect APIs, set up data pipelines, or query SaaS platforms with SQL, including scenarios where they already have a dlt-produced DuckDB file.

View source Repository: WrenAI

Install in Claude Code

Copy

git clone --depth 1 https://github.com/Canner/WrenAI /tmp/dlt-connector && cp -r /tmp/dlt-connector/core/wren/src/wren/skills_content/dlt-connector ~/.claude/skills/dlt-connector

Then start a new Claude Code session; the skill loads automatically.

Definition

SKILL.md

# wren-dlt-connector

> Reference docs (`dlt_sources`) and the `introspect_dlt` script are bundled. Pull references with `wren skills get dlt-connector --full`; fetch a script with `wren skills get dlt-connector --script <name>`.

Connect SaaS data to Wren Engine for SQL analysis — from zero to a verified, queryable project in one conversation.

## Who this is for

Data analysts who know SQL and some Python, but may not have used dlt or Wren before. Explain concepts briefly when they first appear, but don't over-explain things a SQL-literate person would already know.

## Overview

This skill walks through a four-phase workflow:

1. **Extract** — Use dlt (data load tool) to pull data from a SaaS API into a local DuckDB file
2. **Model** — Introspect the DuckDB schema and auto-generate a Wren semantic project (YAML models, relationships, profile)
3. **Build & Verify** — Build the project and run actual SQL queries to confirm everything works end-to-end
4. **Handoff** — Show the user their data and next steps

The user might enter at any phase. Ask which phase they're starting from — they may already have a `.duckdb` file and just need phases 2–4.

**The goal is a project that actually queries successfully, not just files that look correct.** Always run the verification step before declaring success.

## Critical: DuckDB catalog naming

When wren engine connects to a DuckDB file, it ATTACHes it using the filename (without `.duckdb` extension) as the catalog alias:

```
ATTACH DATABASE 'stripe_data.duckdb' AS "stripe_data" (READ_ONLY)
```

This means **every model's `table_reference.catalog` must equal the DuckDB filename stem**. If the file is `hubspot.duckdb`, the catalog is `hubspot`. If it's `my_pipeline.duckdb`, the catalog is `my_pipeline`.

Getting this wrong causes "table not found" errors at query time. The `introspect_dlt.py` script handles this automatically.

## Critical: Type normalization

Column types must be normalized using wren SDK's `type_mapping.parse_type()` function, which uses sqlglot to convert database-specific types (like DuckDB's `HUGEINT`, `TIMESTAMP WITH TIME ZONE`) into canonical SQL types that wren-core understands. Do not hardcode type mappings — always delegate to `parse_type(raw_type, "duckdb")`.

The `introspect_dlt.py` script does this automatically when wren SDK is installed.

## Phase 1: Extract — dlt Pipeline Setup

### Step 1: Pick the SaaS source

Ask the user which SaaS service they want to connect. Read `dlt_sources` for a list of popular verified sources and their auth requirements. If the source isn't listed, check whether dlt has a verified source for it by searching `dlthub.com/docs/dlt-ecosystem/verified-sources`.

### Step 2: Install dlt

```bash
pip install "dlt[duckdb]" --break-system-packages
```

### Step 3: Write the pipeline script

Create a Python script that:
1. Imports the dlt source function for the chosen SaaS
2. Configures the pipeline with `destination='duckdb'` and a local file path
3. Runs the pipeline with `pipeline.run(source)`

Here's the general pattern — adapt it per source (check `dlt_sources` for source-specific templates):

```python
import dlt

pipeline = dlt.pipeline(
    pipeline_name="<source>_pipeline",
    destination="duckdb",
    dataset_name="<source>_data",
)

# Source-specific: check the dlt_sources reference for auth patterns
source = <source_function>(api_key=dlt.secrets.value)

info = pipeline.run(source)
print(info)
```

### Step 4: Set up credentials

dlt reads credentials from environment variables or `.dlt/secrets.toml`. The simplest approach for a one-time run:

```bash
# Set the credential as an environment variable
# The exact variable name depends on the source — check the dlt_sources reference
export SOURCES__<SOURCE>__API_KEY="the-actual-key"
```

Ask the user for their API key or token. Remind them:
- Never commit credentials to git
- Environment variables are the simplest way for a one-time run
- For repeated use, they can create `.dlt/secrets.toml`

### Step 5: Run the pipeline

```bash
python <pipeline_script>.py
```

After the run, confirm:
1. The pipeline completed without errors
2. A `.duckdb` file was created (usually at `<pipeline_name>.duckdb`)
3. Print discovered tables and their column counts

```python
import duckdb
con = duckdb.connect("<pipeline_name>.duckdb", read_only=True)
for row in con.execute("""
    SELECT table_schema, table_name,
           (SELECT COUNT(*) FROM information_schema.columns c
            WHERE c.table_schema = t.table_schema AND c.table_name = t.table_name) as col_count
    FROM information_schema.tables t
    WHERE table_schema NOT IN ('information_schema', 'pg_catalog')
      AND table_name NOT LIKE '_dlt_%'
    ORDER BY table_schema, table_name
""").fetchall():
    print(f"  {row[0]}.{row[1]} ({row[2]} columns)")
con.close()
```

## Phase 2: Model — Generate Wren Project

Run the introspection script to auto-generate a complete Wren project from the DuckDB file:

```bash
# first fetch the script: wren skills get dlt-connector --script introspect_dlt > introspect_dlt.py
python introspect_dlt.py \
    --duckdb-path <path-to-duckdb-file> \
    --output-dir <project-directory> \
    --project-name <name>
```

This script:
- Connects to the DuckDB file (read-only)
- **Sets `table_reference.catalog` to the DuckDB filename stem** (matching wren engine's ATTACH behavior)
- Discovers all tables and columns via `information_schema`
- Filters out dlt internal tables (`_dlt_loads`, `_dlt_pipeline_state`, etc.)
- Filters out dlt metadata columns (`_dlt_id`, `_dlt_load_id`, `_dlt_list_idx`) from model definitions
- Detects parent-child relationships from `_dlt_parent_id` columns and table naming conventions
- **Normalizes column types using `wren.type_mapping.parse_type()`** (sqlglot-based)
- Generates a complete v2 YAML project (wren_project.yml, models/, relationships.yml, instructions.md)

After running, show the user what was generated:

```bash
# Show project sum

More from this repository

enrich-contextSkill

Augment a Wren project with business context that DB schema cannot carry — enum value meanings, units (USD vs cents, ms vs sec), NULL semantics, magic sentinels (-1 = unknown), soft-delete default filters, business synonyms, time-grain / TZ conventions, cross-system identifiers, currency rules, canonical-table preferences, AND named aggregation metrics (ARR, churn, DAU, WAU, NRR) proposed as cubes. Runs in one of two modes selected at session start: `grill` (one question at a time, user-driven) or `auto-pilot` (agent infers and applies, escalates only on conflicts and high-blast-radius additions like new cubes / views / relationships). Reads everything under <project>/raw/ (PDFs, glossaries, handbooks, code, data dictionaries) and optionally samples low-cardinality columns from the live DB (grill mode), compares against the current MDL / cubes / instructions.md / queries.yml / memory pairs, then fills gaps via the ten-category gap catalog and the cube proposal flow. Confirmed findings are written back to the right sink. Use when: user says 'enrich context', 'augment my project', 'grill me on this project', 'auto-fill my context', 'agent doesn't understand our docs / enum values / units / null meanings', 'business context is missing', 'what does status=A mean', 'is this amount in USD or cents', 'we keep getting wrong aggregations', 'add cubes for ARR / DAU / churn', 'we have a handbook / glossary / data dictionary the agent should know'; or after generating an MDL and noticing the agent lacks business semantics.

generate-mdlSkill

Generate a Wren MDL project by exploring a database with available tools (SQLAlchemy, database drivers, MCP connectors, or raw SQL). Guides agents through schema discovery, type normalization, and MDL YAML generation using the wren CLI. Use when: user wants to create or set up a new MDL, onboard a new data source, or scaffold a project from an existing database.

onboardingSkill

Onboard a user to Wren Engine end-to-end. Walks through environment checks, project scaffolding, connection configuration via .env, and first query. Use when: user wants to install Wren Engine, set up a new data source connection, or bootstrap a new project from scratch. Triggers: '/wren-onboarding', 'install wren', 'set up wren engine', 'wren onboarding', 'connect new database to wren'.

usageSkill

Wren Engine CLI workflow guide for AI agents. Answer data questions end-to-end using the wren CLI: gather schema context, recall past queries, write SQL through the MDL semantic layer, execute, and learn from confirmed results. Use when: user asks a data question, requests a report or analysis, asks about metrics, revenue, customers, orders, trends, or any business data; user says 'how many', 'show me', 'what is the', 'top N', 'compare', 'trend', 'growth', 'breakdown'; user wants to explore, analyze, filter, aggregate, or summarize data from a database; agent needs to query data, connect a data source, handle errors, or manage MDL changes via the wren CLI.

wrenSkill

Wren CLI for AI agents — a semantic SQL layer over 22+ databases (Postgres, MySQL, BigQuery, Snowflake, Spark, …). The actual workflow guides live inside the `wren` CLI itself; this is just a discovery stub. Use whenever the user asks a data question (how many, show me, top N, compare, trend, breakdown, metric, revenue, customers, orders), wants to install / set up Wren Engine, connect a new database, connect SaaS data via dlt (HubSpot, Stripe, Salesforce, GitHub, Slack), generate or regenerate an MDL project from a database schema, enrich a project with business context (enum meanings, units, cubes like ARR / DAU / churn), or turn a project's context layer into a shareable GenBI web app / dashboard and deploy it to Vercel or Cloudflare. Triggers: 'install wren', 'set up wren engine', 'connect database to wren', 'connect SaaS to wren', 'load hubspot / stripe / salesforce data', 'generate mdl', 'scaffold wren project', 'enrich wren context', 'augment my project', 'add cubes', 'build a dashboard', 'make a shareable analytics app', 'deploy my context layer as a web app', 'genbi app', 'wren onboarding', 'wren usage', 'wren generate mdl', 'wren dlt connector', 'wren enrich context', 'wren genbi'.

genbiSkill

Turn a Wren project's context layer into a shareable, browser-side GenBI web app and deploy it to the user's Vercel or Cloudflare account. Orchestrates the full flow: `wren genbi build` returns a project-hydrated build instruction, the agent authors the app from scratch into apps/<name>/, then register → verify → deploy produce a shareable URL. Use this skill whenever the user wants to: build a dashboard from their Wren project, make a shareable analytics app, deploy their context layer as a web app, host a GenBI app on Vercel or Cloudflare Pages, or asks for a 'genbi app'.