write-query
The write-query skill generates optimized SQL queries from natural language descriptions, handling multi-CTE structures with joins and aggregations. Use it when translating data requirements into SQL, optimizing queries for large partitioned tables, or obtaining dialect-specific syntax for databases like Snowflake, BigQuery, PostgreSQL, Redshift, Databricks, MySQL, SQL Server, DuckDB, or SQLite, with support for schema discovery when a warehouse connection is available.
git clone --depth 1 https://github.com/openyak/openyak /tmp/write-query && cp -r /tmp/write-query/backend/app/data/plugins/data/skills/write-query ~/.claude/skills/write-querySKILL.md
# /write-query - Write Optimized SQL > If you see unfamiliar placeholders or need to check which tools are connected, see [CONNECTORS.md](../../CONNECTORS.md). Write a SQL query from a natural language description, optimized for your specific SQL dialect and following best practices. ## Usage ``` /write-query <description of what data you need> ``` ## Workflow ### 1. Understand the Request Parse the user's description to identify: - **Output columns**: What fields should the result include? - **Filters**: What conditions limit the data (time ranges, segments, statuses)? - **Aggregations**: Are there GROUP BY operations, counts, sums, averages? - **Joins**: Does this require combining multiple tables? - **Ordering**: How should results be sorted? - **Limits**: Is there a top-N or sample requirement? ### 2. Determine SQL Dialect If the user's SQL dialect is not already known, ask which they use: - **PostgreSQL** (including Aurora, RDS, Supabase, Neon) - **Snowflake** - **BigQuery** (Google Cloud) - **Redshift** (Amazon) - **Databricks SQL** - **MySQL** (including Aurora MySQL, PlanetScale) - **SQL Server** (Microsoft) - **DuckDB** - **SQLite** - **Other** (ask for specifics) Remember the dialect for future queries in the same session. ### 3. Discover Schema (If Warehouse Connected) If a data warehouse MCP server is connected: 1. Search for relevant tables based on the user's description 2. Inspect column names, types, and relationships 3. Check for partitioning or clustering keys that affect performance 4. Look for pre-built views or materialized views that might simplify the query ### 4. Write the Query Follow these best practices: **Structure:** - Use CTEs (WITH clauses) for readability when queries have multiple logical steps - One CTE per logical transformation or data source - Name CTEs descriptively (e.g., `daily_signups`, `active_users`, `revenue_by_product`) **Performance:** - Never use `SELECT *` in production queries -- specify only needed columns - Filter early (push WHERE clauses as close to the base tables as possible) - Use partition filters when available (especially date partitions) - Prefer `EXISTS` over `IN` for subqueries with large result sets - Use appropriate JOIN types (don't use LEFT JOIN when INNER JOIN is correct) - Avoid correlated subqueries when a JOIN or window function works - Be mindful of exploding joins (many-to-many) **Readability:** - Add comments explaining the "why" for non-obvious logic - Use consistent indentation and formatting - Alias tables with meaningful short names (not just `a`, `b`, `c`) - Put each major clause on its own line **Dialect-specific optimizations:** - Apply dialect-specific syntax and functions (see `sql-queries` skill for details) - Use dialect-appropriate date functions, string functions, and window syntax - Note any dialect-specific performance features (e.g., Snowflake clustering, BigQuery partitioning) ### 5. Present the Query Provide: 1. **The complete query** in a SQL code block with syntax highlighting 2. **Brief explanation** of what each CTE or section does 3. **Performance notes** if relevant (expected cost, partition usage, potential bottlenecks) 4. **Modification suggestions** -- how to adjust for common variations (different time range, different granularity, additional filters) ### 6. Offer to Execute If a data warehouse is connected, offer to run the query and analyze the results. If the user wants to run it themselves, the query is ready to copy-paste. ## Examples **Simple aggregation:** ``` /write-query Count of orders by status for the last 30 days ``` **Complex analysis:** ``` /write-query Cohort retention analysis -- group users by their signup month, then show what percentage are still active (had at least one event) at 1, 3, 6, and 12 months after signup ``` **Performance-critical:** ``` /write-query We have a 500M row events table partitioned by date. Find the top 100 users by event count in the last 7 days with their most recent event type. ``` ## Tips - Mention your SQL dialect upfront to get the right syntax immediately - If you know the table names, include them -- otherwise Claude will help you find them - Specify if you need the query to be idempotent (safe to re-run) or one-time - For recurring queries, mention if it should be parameterized for date ranges
Convert laboratory instrument output files (PDF, CSV, Excel, TXT) to Allotrope Simple Model (ASM) JSON format or flattened 2D CSV. Use this skill when scientists need to standardize instrument data for LIMS systems, data lakes, or downstream analysis. Supports auto-detection of instrument types. Outputs include full ASM JSON, flattened CSV for easy import, and exportable Python code for data engineers. Common triggers include converting instrument files, standardizing lab data, preparing data for upload to LIMS/ELN systems, or generating parser code for production pipelines.
Run nf-core bioinformatics pipelines (rnaseq, sarek, atacseq) on sequencing data. Use when analyzing RNA-seq, WGS/WES, or ATAC-seq data—either local FASTQs or public datasets from GEO/SRA. Triggers on nf-core, Nextflow, FASTQ analysis, variant calling, gene expression, differential expression, GEO reanalysis, GSE/GSM/SRR accessions, or samplesheet creation.
This skill should be used when scientists need help with research problem selection, project ideation, troubleshooting stuck projects, or strategic scientific decisions. Use this skill when users ask to pitch a new research idea, work through a project problem, evaluate project risks, plan research strategy, navigate decision trees, or get help choosing what scientific problem to work on. Typical requests include "I have an idea for a project", "I'm stuck on my research", "help me evaluate this project", "what should I work on", or "I need strategic advice about my research".
Deep learning for single-cell analysis using scvi-tools. This skill should be used when users need (1) data integration and batch correction with scVI/scANVI, (2) ATAC-seq analysis with PeakVI, (3) CITE-seq multi-modal analysis with totalVI, (4) multiome RNA+ATAC analysis with MultiVI, (5) spatial transcriptomics deconvolution with DestVI, (6) label transfer and reference mapping with scANVI/scArches, (7) RNA velocity with veloVI, or (8) any deep learning-based single-cell method. Triggers include mentions of scVI, scANVI, totalVI, PeakVI, MultiVI, DestVI, veloVI, sysVI, scArches, variational autoencoder, VAE, batch correction, data integration, multi-modal, CITE-seq, multiome, reference mapping, latent space.
Performs quality control on single-cell RNA-seq data (.h5ad or .h5 files) using scverse best practices with MAD-based filtering and comprehensive visualizations. Use when users request QC analysis, filtering low-quality cells, assessing data quality, or following scverse/scanpy best practices for single-cell analysis.
Set up your bio-research environment and explore available tools. Use when first getting oriented with the plugin, checking which literature, drug-discovery, or visualization MCP servers are connected, or surveying available analysis skills before starting a new project.
>
>