generate
The /generate slash command creates production-ready data analysis code in a specified programming language (Python, R, SQL, or JavaScript) for a designated analysis type (data-cleaning, statistical, visualization, machine-learning, or custom). Use this command when you need scaffolded, documented code with proper error handling and best practices implemented for your specific analytical task.
mkdir -p ~/.claude/commands && curl -fsSL https://raw.githubusercontent.com/liangdabiao/claude-data-analysis/HEAD/.claude/commands/generate.md -o ~/.claude/commands/generate.mdgenerate.md
# Code Generation Command
Generate data analysis code in `$1` language for `$2` analysis type using the code-generator subagent.
## Context
- Programming language: $1 (python, r, sql, javascript)
- Analysis type: $2 (data-cleaning, statistical, visualization, machine-learning, custom)
- Current working directory: !`pwd`
- Output directory: ./generated_code/
- Available libraries and frameworks based on language
## Your Task
Use the code-generator subagent to create high-quality, production-ready analysis code:
### 1. Requirements Analysis
- Understand the specific analysis requirements
- Identify appropriate libraries and frameworks
- Consider data types and volumes
- Plan for scalability and performance
### 2. Code Architecture
- Design modular, reusable code structure
- Implement proper error handling
- Include comprehensive documentation
- Add unit tests where appropriate
### 3. Implementation
- Write clean, efficient, and maintainable code
- Include proper data validation
- Implement best practices for the language
- Add logging and debugging capabilities
### 4. Documentation
- Create comprehensive code documentation
- Include usage examples and tutorials
- Provide troubleshooting guidance
- Document dependencies and requirements
## Language Support
### Python
- **Libraries**: pandas, numpy, matplotlib, seaborn, scikit-learn, plotly
- **Use Cases**: Data cleaning, statistical analysis, machine learning, visualization
- **Output**: Jupyter notebooks, Python scripts, modules
### R
- **Libraries**: tidyverse, ggplot2, dplyr, caret, shiny
- **Use Cases**: Statistical analysis, data visualization, bioinformatics
- **Output**: R scripts, R Markdown documents, Shiny apps
### SQL
- **Dialects**: PostgreSQL, MySQL, SQLite, BigQuery, Redshift
- **Use Cases**: Data extraction, aggregation, reporting, ETL
- **Output**: SQL queries, stored procedures, views
### JavaScript
- **Libraries**: D3.js, Plotly.js, Chart.js, TensorFlow.js
- **Use Cases**: Web visualizations, interactive dashboards, client-side ML
- **Output**: HTML/JS files, Node.js scripts, web applications
## Analysis Types
### Data Cleaning
- Missing value handling
- Outlier detection and treatment
- Data type conversion
- Normalization and standardization
- Feature engineering
### Statistical Analysis
- Descriptive statistics
- Hypothesis testing
- Correlation and regression
- Time series analysis
- ANOVA and t-tests
### Visualization
- Chart creation code
- Dashboard implementation
- Interactive visualizations
- Custom plot types
- Animation and transitions
### Machine Learning
- Data preprocessing
- Model training and evaluation
- Feature selection
- Hyperparameter tuning
- Model deployment
### Custom
- User-specific requirements
- Domain-specific analysis
- Integration with existing systems
- Performance optimization
- Custom algorithms
## Expected Output
### Code Files
- `generated_code/$1_$2_analysis.py` - Main analysis script
- `generated_code/$1_$2_utils.py` - Utility functions
- `generated_code/$1_$2_config.py` - Configuration settings
- `generated_code/$1_$2_test.py` - Unit tests
- `generated_code/requirements_$1.txt` - Dependencies
### Documentation
- **README.md**: Usage instructions and examples
- **API Documentation**: Function and class documentation
- **Tutorials**: Step-by-step guides
- **Troubleshooting**: Common issues and solutions
## Code Quality Standards
### Python Code Standards
```python
"""
High-quality Python code template for data analysis
"""
import pandas as pd
import numpy as np
from typing import Dict, List, Optional
import logging
from pathlib import Path
class DataAnalyzer:
"""
Data analysis class with comprehensive functionality
Args:
data_path (str): Path to input data file
config (Dict): Configuration parameters
Attributes:
data (pd.DataFrame): Loaded dataset
config (Dict): Configuration settings
logger (logging.Logger): Logger instance
"""
def __init__(self, data_path: str, config: Dict = None):
self.data_path = Path(data_path)
self.config = config or {}
self.data = None
self.logger = self._setup_logger()
def _setup_logger(self) -> logging.Logger:
"""Set up logging configuration"""
logger = logging.getLogger(__name__)
logger.setLevel(logging.INFO)
return logger
def load_data(self) -> pd.DataFrame:
"""
Load data from file with error handling
Returns:
pd.DataFrame: Loaded dataset
Raises:
FileNotFoundError: If data file doesn't exist
ValueError: If data format is invalid
"""
try:
# Implementation with proper error handling
pass
except Exception as e:
self.logger.error(f"Error loading data: {e}")
raise
```
### SQL Code Standards
```sql
-- High-quality SQL template for data analysis
-- Include proper comments and documentation
-- Analysis: Customer Segmentation
-- Purpose: Identify customer segments based on purchase behavior
-- Dependencies: customers, orders, order_items tables
WITH customer_summary AS (
-- Calculate customer-level metrics
SELECT
c.customer_id,
c.customer_name,
c.signup_date,
COUNT(DISTINCT o.order_id) AS total_orders,
SUM(oi.quantity * oi.unit_price) AS total_revenue,
AVG(oi.quantity * oi.unit_price) AS avg_order_value,
MAX(o.order_date) AS last_order_date
FROM customers c
LEFT JOIN orders o ON c.customer_id = o.customer_id
LEFT JOIN order_items oi ON o.order_id = oi.order_id
GROUP BY c.customer_id, c.customer_name, c.signup_date
),
segment_calculation AS (
-- Calculate RFM metrics and segments
SELECT
customer_id,
customer_name,
total_orders,
total_revenue,
avg_order_value,
-- Recency: days since last order
DATEDIFF(CURRENT_DATE, lastExpert code generation specialist for creating high-quality, production-ready analysis code in multiple programming languages. Use proactively for any code generation task requiring clean, efficient, and maintainable code for data analysis, machine learning, and visualization.
Advanced data exploration and analysis specialist for statistical analysis, pattern discovery, machine learning insights, and actionable business intelligence. Use proactively for any data analysis task requiring deep insights and comprehensive understanding.
Research hypothesis generation specialist for creating testable hypotheses, experimental designs, and research methodologies. Use proactively when data analysis suggests deeper investigation or when planning new research initiatives.
Data quality and validation specialist ensuring data integrity, analysis accuracy, and result reliability. Use proactively for any data validation, quality checks, or result verification tasks.
Expert report writer specializing in comprehensive data analysis documentation, executive summaries, and technical documentation. Use proactively to create polished, professional reports.
Expert data visualization specialist for creating interactive, insightful, and publication-quality visualizations with advanced analytics integration and storytelling capabilities. Use proactively when data analysis would benefit from visual representation or when communicating complex insights to stakeholders.
Perform comprehensive data analysis on specified dataset
自动化完成整个数据分析工作流程,从数据质量检查到最终报告生成