Skill826 repo starsupdated 4d ago

executing-spark

This Claude Code skill executes arbitrary PySpark or Python code on Microsoft Fabric's Spark compute through the Livy API without creating persistent notebook artifacts. Use it when you need to run ephemeral data transformations, ETL operations, or analytics queries directly against lakehouse Delta tables with full read/write access, particularly in automation scenarios where session cleanup is critical to avoid unnecessary capacity consumption.

View source Repository: power-bi-agentic-development

Install in Claude Code

Copy

git clone --depth 1 https://github.com/data-goblin/power-bi-agentic-development /tmp/executing-spark && cp -r /tmp/executing-spark/plugins/etl/skills/executing-spark ~/.claude/skills/executing-spark

Then start a new Claude Code session; the skill loads automatically.

Definition

SKILL.md

# Executing Spark Code in Fabric (No Notebook)

Run arbitrary PySpark or Python code on Fabric Spark compute via the Livy API. No notebook artifact is created or persisted; sessions are ephemeral. Full read/write access to lakehouse Delta tables via Spark SQL.

## Prerequisites

- Azure CLI authenticated (`az login`)
- A lakehouse in the target workspace (the Livy session runs against it)
- Fabric capacity (F or trial)

## Critical: Authentication

The Livy API requires a token from `az account get-access-token --resource https://api.fabric.microsoft.com`. Tokens from `fab auth` do **not** work for OneLake storage access inside the Spark session.

```python
import subprocess, json

result = subprocess.run(
    ["az", "account", "get-access-token", "--resource", "https://api.fabric.microsoft.com"],
    capture_output=True, text=True
)
token = json.loads(result.stdout)["accessToken"]
```

Do not output or log the token. Pass it directly to the API call.

## Lifecycle

```
1. Create session   POST .../sessions              {"kind": "pyspark"}
2. Wait for idle    GET  .../sessions/{id}          poll until state: "idle" (~30-90s)
3. Submit code      POST .../sessions/{id}/statements   {"code": "...", "kind": "pyspark"}
4. Get result       GET  .../sessions/{id}/statements/{n}   poll until state: "available"
5. Delete session   DELETE .../sessions/{id}        ALWAYS do this
```

Base URL: `https://api.fabric.microsoft.com/v1/workspaces/{wsId}/lakehouses/{lhId}/livyapi/versions/2023-12-01`

**CRITICAL: Always delete sessions when done.** Idle sessions consume Fabric capacity units (CUs). A forgotten session burns compute until it times out (default: 20 minutes). In automation, wrap cleanup in a `finally` block.

## Getting IDs

```bash
WS_ID=$(fab get "Workspace.Workspace" -q "id" | tr -d '"')
LH_ID=$(fab get "Workspace.Workspace/Lakehouse.Lakehouse" -q "id" | tr -d '"')
```

## Submitting Code

Submit PySpark or pure Python as statements. The `spark` object is available automatically.

```python
# Statement payload
{"code": "df = spark.sql('SELECT * FROM products LIMIT 10')\ndf.show()", "kind": "pyspark"}
```

Results are in `output.data["text/plain"]` when `state: "available"` and `output.status: "ok"`.

## What Works

- `spark.sql("SELECT ...")` ; full Spark SQL against lakehouse tables
- `spark.sql("SHOW TABLES")` ; metastore access
- `df.write.mode("overwrite").saveAsTable(...)` ; write Delta tables
- Pure Python (pandas, numpy, pyarrow); runs on Spark container
- In-memory Spark DataFrames and transformations
- Multiple sequential statements in one session

## What Does Not Work

- `deltalake` (delta-rs) is not pre-installed; use Spark SQL instead
- `notebookutils` has limited functionality (no FUSE mount at `/lakehouse/default/`)
- Tokens from `fab auth` ; must use `az` CLI token
- Tokens expire after ~60 minutes; long sessions need token refresh

## When to Use This vs Alternatives

| Scenario | Approach |
|----------|----------|
| Quick read-only exploration | DuckDB locally (fastest; see `using-duckdb` skill) |
| Write data back to lakehouse | Livy session or notebook |
| Ephemeral transform; no artifact | Livy session (this skill) |
| Complex multi-cell workflow | Notebook (`nb exec` or portal) |
| Scheduled ETL | Notebook via `fab job run` |
| Agent-driven compute (Dagster, orchestrators) | Livy session |

## References

- **`references/livy-api.md`** -- Full API reference with endpoints, request/response formats, and error handling
- **`references/example-script.md`** -- Complete working script that creates a session, queries data, writes results, and cleans up

More from this repository

audit-tenant-settingsSkill

Automatically invoke this skill whenever the user asks about Fabric tenant settings or Power BI tenant settings or auditing tenant settings. You can use this skill if the user mentions "Fabric administration".

fabric-cliSkill

Expert guidance for using the Fabric CLI (`fab`) to fully interact with Fabric workspaces, items, and configuration. Automatically invoke this skill whenever the user mentions "Fabric" or "Power BI Service" or a "Fabric/Power BI workspace".

connect-pbidSkill

TOM and ADOMD.NET guidance via PowerShell for connecting to Power BI Desktop's local Analysis Services instance. Covers model enumeration, DAX queries, metadata modification, annotations, calendar definitions, field parameters, query tracing, DAX library package management (daxlib.org), and the Desktop Bridge for reloading and screenshotting the report canvas. Automatically invoke when the user mentions "Power BI Desktop", "Analysis Services port", "TOM", "ADOMD", "daxlib", "DAX library", "DAX UDF package", or asks to "connect to PBI Desktop", "query PBI Desktop with DAX", "modify PBI Desktop model", "add a measure to PBI", "capture visual queries", "create a field parameter", "validate DAX", "intercept DAX queries", "install daxlib", "add DAX SVG", "add IBCS", "reload the report canvas", "screenshot a report page", "Desktop Bridge", or to work with the model and report in Power BI Desktop together.

pbipSkill

Expert guidance for the Power BI Project (PBIP) file format; project structure, cross-cutting operations (renames, forking), and PBIX extraction/conversion. Automatically invoke when the user mentions PBIP, PBIX, .pbip/.pbism/.platform files, or asks about "PBIP project structure", "PBIP vs PBIX", "thin report vs thick report", "rename a table", "cascade rename", "fork a PBIP project", "convert pbix to pbip", "extract pbix", "what files are in a PBIP", "PBIP encoding", "definition.pbir", or discusses project-level file structure and post-rename verification.

pbir-formatSkill

Format reference for Power BI Enhanced Report (PBIR) JSON schemas and patterns. Automatically invoke when the user asks about PBIR JSON structure, visual.json properties, PBIR expressions, objects vs visualContainerObjects, theme inheritance, conditional formatting patterns, extension measures, bookmarks, field references, filter formatting, query roles, PBIR page structure, report wallpaper, or any PBIR metadata format question.

tmdlSkill

Direct TMDL file authoring and BIM-to-TMDL conversion for semantic models in PBIP projects. Automatically invoke when the user asks to "edit TMDL", "add a measure in TMDL", "TMDL syntax", "fix formatString", "fix summarizeBy", "TMDL indentation", "convert BIM to TMDL", "add a column description", "create a calculated column in TMDL", or mentions .tmdl file editing or BIM-to-TMDL migration.

create-pbi-reportSkill

Step-by-step workflow for creating complete Power BI reports from scratch using pbir CLI. Covers model discovery, report creation, page layout, theme setup, visual placement, field binding, filtering, formatting, validation, and publishing. Automatically invoke when the user asks to "create a new report", "build a report from scratch", "make a dashboard", "set up a report with KPIs", "create an executive dashboard", "add pages and visuals to a new report".

deneb-visualsSkill

Deneb visual creation, Vega/Vega-Lite spec authoring, and Deneb best practices for PBIR reports. Automatically invoke whenever the user mentions "Deneb" in any context, or asks about Vega/Vega-Lite specs in Power BI, Deneb cross-filtering, Deneb interactivity, pbiColor theme integration, Deneb field name escaping, or Deneb rendering issues.