Skill113 estrellas del repoactualizado 12d ago

optimizing-query-text

The optimizing-query-text skill analyzes SQL queries for Snowflake and removes functions applied to filter and join columns that prevent query optimization techniques like partition pruning and hash joins. Use this skill when refactoring SQL queries to improve performance by converting function-wrapped conditions into equivalent range-based or direct comparisons, while guaranteeing semantically identical results through strict preservation rules for columns, row counts, ordering, and limits.

Ver fuente Repositorio: data-engineering-skills

Instalar en Claude Code

Copiar

git clone --depth 1 https://github.com/AltimateAI/data-engineering-skills /tmp/optimizing-query-text && cp -r /tmp/optimizing-query-text/skills/snowflake/optimizing-query-text ~/.claude/skills/optimizing-query-text

Después abre una sesión nueva de Claude Code; el skill carga automáticamente.

Definición

SKILL.md

# Optimize Query from SQL Text

## OUTPUT FORMAT

Return ONLY the optimized SQL query. No markdown formatting, no explanations, no bullet points - just pure SQL that can be executed directly in Snowflake.

## CRITICAL: Semantic Preservation Rules

**The optimized query MUST return IDENTICAL results to the original.**

Before returning ANY optimization, verify:
- **Same columns**: Exact same columns in exact same order with exact same aliases
- **Same rows**: Filter conditions must be semantically equivalent
- **Same ordering**: Preserve `ORDER BY` exactly as written
- **Same limits**: If original has `LIMIT N`, keep `LIMIT N`. If no LIMIT, do NOT add one.

**If you cannot guarantee identical results, return the original query unchanged.**

---

## Pattern 1: Function on Filter Column

**Problem**: Functions on columns in WHERE clause prevent partition pruning and index usage.

### CAN Fix

| Original | Optimized | Why Safe |
|----------|-----------|----------|
| `WHERE DATE(ts) = '2024-01-01'` | `WHERE ts >= '2024-01-01' AND ts < '2024-01-02'` | Equivalent range |
| `WHERE YEAR(dt) = 2024` | `WHERE dt >= '2024-01-01' AND dt < '2025-01-01'` | Equivalent range |
| `WHERE MONTH(dt) = 3 AND YEAR(dt) = 2024` | `WHERE dt >= '2024-03-01' AND dt < '2024-04-01'` | Equivalent range |
| `WHERE DATE(ts) >= '2024-01-01' AND DATE(ts) < '2024-02-01'` | `WHERE ts >= '2024-01-01' AND ts < '2024-02-01'` | Same boundaries |
| `WHERE YEAR(dt) BETWEEN 1995 AND 1996` | `WHERE dt >= '1995-01-01' AND dt < '1997-01-01'` | Equivalent range |

### CANNOT Fix

| Pattern | Why Not |
|---------|---------|
| `WHERE YEAR(dt) IN (SELECT year FROM ...)` | Dynamic values, cannot precompute range |
| `WHERE DATE(ts) = DATE(other_col)` | Comparing two columns, both need function |
| `WHERE EXTRACT(DOW FROM dt) = 1` | Day-of-week has no contiguous range |
| `WHERE DATE_TRUNC('month', dt) = '2024-01-01'` in GROUP BY | Needed for grouping logic |
| `SELECT YEAR(dt) AS yr ... GROUP BY YEAR(dt)` | Function in SELECT/GROUP BY is fine, only filter matters |

---

## Pattern 2: Function on JOIN Column

**Problem**: Functions on JOIN columns prevent hash joins, forcing slower nested loop joins.

### CAN Fix

| Original | Optimized | Why Safe |
|----------|-----------|----------|
| `ON CAST(a.id AS VARCHAR) = CAST(b.id AS VARCHAR)` | `ON a.id = b.id` | If both are same type (e.g., INTEGER) |
| `ON UPPER(a.code) = UPPER(b.code)` | `ON a.code = b.code` | If data is already consistently cased |
| `ON TRIM(a.name) = TRIM(b.name)` | `ON a.name = b.name` | If data has no leading/trailing spaces |

### CANNOT Fix

| Pattern | Why Not |
|---------|---------|
| `ON CAST(a.id AS VARCHAR) = b.string_id` | Types genuinely differ, CAST required |
| `ON DATE(a.timestamp) = b.date_col` | Different granularity, DATE() required |
| `ON UPPER(a.code) = b.code` | If b.code might have different case |
| `ON a.id = b.id + 1` | Arithmetic transformation, cannot remove |

---

## Pattern 3: NOT IN Subquery

**Problem**: `NOT IN` has poor performance and unexpected NULL behavior.

### CAN Fix

| Original | Optimized | Why Safe |
|----------|-----------|----------|
| `WHERE id NOT IN (SELECT id FROM t WHERE ...)` | `WHERE NOT EXISTS (SELECT 1 FROM t WHERE t.id = main.id AND ...)` | Equivalent when subquery column is NOT NULL |
| `WHERE id NOT IN (SELECT id FROM t)` where id has NOT NULL constraint | `WHERE NOT EXISTS (SELECT 1 FROM t WHERE t.id = main.id)` | NOT NULL guarantees equivalence |

### CANNOT Fix

| Pattern | Why Not |
|---------|---------|
| `WHERE id NOT IN (SELECT nullable_col FROM t)` | If subquery returns NULL, NOT IN returns no rows; NOT EXISTS doesn't |
| `WHERE (a, b) NOT IN (SELECT x, y FROM t)` | Multi-column NOT IN has complex NULL semantics |

**Key Rule**: Only convert NOT IN to NOT EXISTS if you can verify the subquery column cannot be NULL.

---

## Pattern 4: Repeated Subquery

**Problem**: Same subquery executed multiple times causes redundant scans.

### CAN Fix

| Original | Optimized |
|----------|-----------|
| Subquery appears 2+ times identically | Extract to CTE, reference CTE multiple times |
| Same aggregation used in multiple places | Compute once in CTE |

### CANNOT Fix

| Pattern | Why Not |
|---------|---------|
| Correlated subquery (references outer table) | Each execution is different, cannot cache |
| Subqueries with different filters | Not actually the same subquery |
| Subquery in SELECT that depends on current row | Correlation prevents extraction |

---

## Pattern 5: Implicit Comma Joins

**Problem**: Comma-separated tables in FROM clause are harder to read and optimize.

### CAN Fix - Always

Convert `FROM a, b, c WHERE a.id = b.id AND b.id = c.id` to explicit JOIN syntax.

This is always safe - just restructuring, no semantic change.

---

## UNSAFE Optimizations (NEVER apply)

- **UNION to UNION ALL**: UNION deduplicates rows, UNION ALL does not - different results
- **Changing window functions**: Do not modify `SUM(SUM(x)) OVER(...)` or similar nested aggregates
- **Adding redundant filters**: Do not add filters in JOIN ON if same filter exists in WHERE
- **Changing column names**: Copy column names EXACTLY from original - do not "simplify" or rename
- **Changing column aliases**: Keep all aliases exactly as original
- **Adding early filtering in JOINs**: If a filter is in WHERE, do not duplicate it in JOIN ON clause

---

## Principles

1. **Minimal changes**: Make the fewest changes necessary. Simpler optimizations are more reliable.
2. **Preserve structure**: Keep subqueries, CTEs, and overall query structure unless there's a clear benefit.
3. **When in doubt, don't**: If unsure whether a change preserves semantics, skip it.
4. **Copy exactly**: Column names, table aliases, and expressions should be copied character-for-character.

---

## Priority Order

1. **Date/time functions on filter columns** - Highest impact
2. **Implicit joins to explicit JOIN** - Always safe, improves readability
3. **NOT I

Del mismo repositorio

New Skill ProposalSkill

altimate-codeSkill

Delegates dbt and warehouse work to altimate-code, a specialized CLI agent with 100+ purpose-built data tools. USE THIS SKILL FIRST whenever the task mentions or implies: warehouse access (Snowflake, BigQuery, Redshift, Databricks, Postgres, MySQL, DuckDB), column-level lineage, downstream-impact analysis, dbt builds against a real warehouse, cross-database migration or validation, query cost attribution, schema diff, data parity checking, PII detection from sampled rows, incremental/SCD2/backfill verification, FinOps reporting, model documentation generation, query optimization, anomaly detection, dev-vs-prod diffing, or when the user says "altimate", "altimate-code", or "the data agent". Strongly preferred over Read/Write/Edit whenever the task touches live data, lineage, or warehouse state — consult before doing the work yourself.

creating-dbt-modelsSkill

debugging-dbt-errorsSkill

developing-incremental-modelsSkill

documenting-dbt-modelsSkill

migrating-sql-to-dbtSkill

refactoring-dbt-modelsSkill