Skip to main content
ClaudeWave
Skill711 repo starsupdated today

using-duckdb

This Claude Code skill enables querying Delta Lake tables and raw files stored in Microsoft Fabric's OneLake using DuckDB, supporting both local command-line analysis and in-notebook execution within Fabric Spark containers. Use it when developers need fast, read-only analytical queries on lakehouse data, data validation, quality checks, or exploration without writing data back to Fabric.

Install in Claude Code
Copy
git clone --depth 1 https://github.com/data-goblin/power-bi-agentic-development /tmp/using-duckdb && cp -r /tmp/using-duckdb/plugins/etl/skills/using-duckdb ~/.claude/skills/using-duckdb
Then start a new Claude Code session; the skill loads automatically.

SKILL.md

# Using DuckDB with Fabric

Query Delta Lake tables and raw files in OneLake using DuckDB. Works both locally (CLI/Python) and inside Fabric notebooks. Read-only; for writes, use the `executing-spark` skill.

## Two Modes

| Mode | Where it runs | Auth | Best for |
|------|--------------|------|----------|
| **Local** | Developer machine | Azure CLI (`az login`) | Exploration, validation, ad-hoc analysis |
| **In-notebook** | Fabric Spark container | `notebookutils.credentials.getToken('storage')` | Combining DuckDB speed with Spark write-back |

## Local: Prerequisites

- DuckDB installed (`brew install duckdb` on macOS)
- Azure CLI authenticated (`az login`)
- Extensions installed: `INSTALL delta; INSTALL azure;` (one-time)

## Local: Querying Delta Tables

```bash
WS_ID=$(fab get "Workspace.Workspace" -q "id" | tr -d '"')
LH_ID=$(fab get "Workspace.Workspace/LH.Lakehouse" -q "id" | tr -d '"')

duckdb -c "
LOAD delta; LOAD azure;
CREATE SECRET (TYPE azure, PROVIDER credential_chain, CHAIN 'cli');

SELECT * FROM delta_scan(
  'abfss://${WS_ID}@onelake.dfs.fabric.microsoft.com/${LH_ID}/Tables/schema/table'
) LIMIT 10;
"
```

The `CHAIN 'cli'` parameter uses Azure CLI credentials. Without it, DuckDB tries managed identity first (fails on local machines).

## Local: Querying Raw Files

```bash
BASE="abfss://${WS_ID}@onelake.dfs.fabric.microsoft.com/${LH_ID}/Files"

duckdb -c "
LOAD azure;
CREATE SECRET (TYPE azure, PROVIDER credential_chain, CHAIN 'cli');

SELECT * FROM read_csv('${BASE}/data.csv') LIMIT 10;
SELECT * FROM read_parquet('${BASE}/facts.parquet') LIMIT 10;
SELECT * FROM read_json('${BASE}/events/*.json');
"
```

Glob patterns (`*`, `**`) work for reading multiple files.

## In-Notebook: Attaching DuckDB to a Lakehouse

Inside a Fabric notebook, DuckDB can query lakehouse Delta tables directly using a storage token. This approach is faster than Spark SQL for analytical queries on single-node data.

```python
import duckdb
import time

# Get storage token from notebook context
token = notebookutils.credentials.getToken('storage')

# Create DuckDB connection
con = duckdb.connect(f'temp_{time.time_ns()}.duckdb')
con.sql('SET enable_object_cache=true')

# Register OneLake secret
con.sql(f"""
    CREATE OR REPLACE SECRET onelake (
        TYPE AZURE,
        PROVIDER ACCESS_TOKEN,
        ACCESS_TOKEN '{token}'
    )
""")

# Query Delta tables
workspace = "<workspace-id>"
lakehouse = "<lakehouse-name>"
path = f"abfss://{workspace}@onelake.dfs.fabric.microsoft.com/{lakehouse}.Lakehouse/Tables"

df = con.sql(f"""
    SELECT * FROM delta_scan('{path}/schema/table_name') LIMIT 100
""").df()
print(df)
```

### Auto-Discovering Tables

Dynamically find all Delta tables in a lakehouse:

```python
tables = con.sql(f"""
    SELECT DISTINCT split_part(file, '_delta_log', 1) as table_path
    FROM glob('{path}/*/*/*_delta_log/*.json')
""").df()['table_path'].tolist()

for t in tables:
    view_name = t.split('/')[-1]
    con.sql(f"CREATE OR REPLACE VIEW {view_name} AS SELECT * FROM delta_scan('{t}')")
    print(f"Created view: {view_name}")
```

## OneLake Path Format

```
abfss://<workspace-id>@onelake.dfs.fabric.microsoft.com/<item-id>/Tables/<schema>/<table>
abfss://<workspace-id>@onelake.dfs.fabric.microsoft.com/<item-id>/Files/<path>
```

| Item type | ID source |
|-----------|-----------|
| Lakehouse | `fab get "ws/LH.Lakehouse" -q "id"` |
| Warehouse | `fab get "ws/WH.Warehouse" -q "id"` |
| SQL Database | `fab get "ws/DB.SQLDatabase" -q "id"` |

Cross-item joins work in a single DuckDB query; use different `abfss://` paths.

## Common Patterns

For data freshness checks, quality validation, schema discovery, cross-table joins, and row count audits, see **`references/common-patterns.md`**.

## References

- **`references/common-patterns.md`** -- Data freshness, quality, schema discovery, cross-joins
- **`references/in-notebook-setup.md`** -- Full notebook setup with auto-discovery and write-back patterns
- [DuckDB Azure Extension](https://duckdb.org/docs/extensions/azure.html)
- [DuckDB Delta Extension](https://duckdb.org/docs/extensions/delta.html)
- [djouallah/Fabric_Notebooks_Demo](https://github.com/djouallah/Fabric_Notebooks_Demo/blob/main/Attach_LH/Attach_Lakehouse_v2.ipynb) -- Original notebook-attachment approach
audit-tenant-settingsSkill

Automatically invoke this skill whenever the user asks about Fabric tenant settings or Power BI tenant settings or auditing tenant settings. You can use this skill if the user mentions "Fabric administration".

fabric-cliSkill

Expert guidance for using the Fabric CLI (`fab`) to fully interact with Fabric workspaces, items, and configuration. Automatically invoke this skill whenever the user mentions "Fabric" or "Power BI Service" or a "Fabric/Power BI workspace".

connect-pbidSkill

TOM and ADOMD.NET guidance via PowerShell for connecting to Power BI Desktop's local Analysis Services instance. Covers model enumeration, DAX queries, metadata modification, annotations, calendar definitions, field parameters, query tracing, DAX library package management (daxlib.org), and the Desktop Bridge for reloading and screenshotting the report canvas. Automatically invoke when the user mentions "Power BI Desktop", "Analysis Services port", "TOM", "ADOMD", "daxlib", "DAX library", "DAX UDF package", or asks to "connect to PBI Desktop", "query PBI Desktop with DAX", "modify PBI Desktop model", "add a measure to PBI", "capture visual queries", "create a field parameter", "validate DAX", "intercept DAX queries", "install daxlib", "add DAX SVG", "add IBCS", "reload the report canvas", "screenshot a report page", "Desktop Bridge", or to work with the model and report in Power BI Desktop together.

pbipSkill

Expert guidance for the Power BI Project (PBIP) file format; project structure, cross-cutting operations (renames, forking), and PBIX extraction/conversion. Automatically invoke when the user mentions PBIP, PBIX, .pbip/.pbism/.platform files, or asks about "PBIP project structure", "PBIP vs PBIX", "thin report vs thick report", "rename a table", "cascade rename", "fork a PBIP project", "convert pbix to pbip", "extract pbix", "what files are in a PBIP", "PBIP encoding", "definition.pbir", or discusses project-level file structure and post-rename verification.

pbir-formatSkill

Format reference for Power BI Enhanced Report (PBIR) JSON schemas and patterns. Automatically invoke when the user asks about PBIR JSON structure, visual.json properties, PBIR expressions, objects vs visualContainerObjects, theme inheritance, conditional formatting patterns, extension measures, bookmarks, field references, filter formatting, query roles, PBIR page structure, report wallpaper, or any PBIR metadata format question.

tmdlSkill

Direct TMDL file authoring and BIM-to-TMDL conversion for semantic models in PBIP projects. Automatically invoke when the user asks to "edit TMDL", "add a measure in TMDL", "TMDL syntax", "fix formatString", "fix summarizeBy", "TMDL indentation", "convert BIM to TMDL", "add a column description", "create a calculated column in TMDL", or mentions .tmdl file editing or BIM-to-TMDL migration.

create-pbi-reportSkill

Step-by-step workflow for creating complete Power BI reports from scratch using pbir CLI. Covers model discovery, report creation, page layout, theme setup, visual placement, field binding, filtering, formatting, validation, and publishing. Automatically invoke when the user asks to "create a new report", "build a report from scratch", "make a dashboard", "set up a report with KPIs", "create an executive dashboard", "add pages and visuals to a new report".

deneb-visualsSkill

Deneb visual creation, Vega/Vega-Lite spec authoring, and Deneb best practices for PBIR reports. Automatically invoke whenever the user mentions "Deneb" in any context, or asks about Vega/Vega-Lite specs in Power BI, Deneb cross-filtering, Deneb interactivity, pbiColor theme integration, Deneb field name escaping, or Deneb rendering issues.