Skip to main content
ClaudeWave
Slash Command413 estrellas del repoactualizado 5mo ago

quality

The quality command executes data quality operations on specified datasets through five actions: check performs basic assessment of completeness and consistency, clean removes duplicates and handles missing values, validate applies statistical and business rule checks, monitor sets up continuous tracking with alerts, and profile generates comprehensive statistical analysis. Use this command when you need to evaluate dataset integrity, identify data issues, implement cleaning procedures, or establish ongoing quality monitoring across your data storage repositories.

Instalar en Claude Code
Copiar
mkdir -p ~/.claude/commands && curl -fsSL https://raw.githubusercontent.com/liangdabiao/claude-data-analysis/HEAD/.claude/commands/quality.md -o ~/.claude/commands/quality.md
Después abre una sesión nueva de Claude Code; el slash command carga automáticamente.

quality.md

# Data Quality Command

Execute data quality operations on dataset `$1` with action `$2` using the quality-assurance subagent.

## Context
- Dataset location: @data_storage/$1
- Quality action: $2 (check, clean, validate, monitor, profile)
- Current working directory: !`pwd`
- Output directory: ./quality_reports/
- Quality rules and validation thresholds
- Available quality metrics and KPIs

## Your Task

Use the quality-assurance subagent to perform comprehensive data quality operations:

### 1. Quality Assessment
- Analyze data completeness and accuracy
- Check data consistency and validity
- Assess data uniqueness and timeliness
- Evaluate overall data integrity

### 2. Issue Identification
- Detect missing values and data gaps
- Identify outliers and anomalies
- Find duplicate records and inconsistencies
- Discover format violations and data type issues

### 3. Quality Improvement
- Implement data cleaning procedures
- Apply data validation rules
- Execute data transformation operations
- Perform data standardization

### 4. Monitoring and Reporting
- Generate quality metrics and KPIs
- Create quality assessment reports
- Set up ongoing quality monitoring
- Provide quality improvement recommendations

## Quality Actions

### Check
Perform basic data quality assessment:
- Completeness analysis
- Basic accuracy validation
- Simple consistency checks
- Summary quality metrics

### Clean
Execute data cleaning operations:
- Remove duplicate records
- Handle missing values
- Correct format violations
- Standardize data formats

### Validate
Comprehensive data validation:
- Statistical validation
- Business rule validation
- Cross-field validation
- Referential integrity checks

### Monitor
Set up quality monitoring:
- Continuous quality tracking
- Alert threshold configuration
- Quality trend analysis
- Performance metrics monitoring

### Profile
Generate comprehensive data profile:
- Detailed data statistics
- Distribution analysis
- Relationship analysis
- Data lineage documentation

## Quality Dimensions

### Completeness
- **Missing Value Analysis**: Identify and quantify missing data
- **Required Field Validation**: Check presence of mandatory fields
- **Record Completeness**: Assess completeness of individual records
- **Data Coverage**: Evaluate coverage of expected data range

### Accuracy
- **Statistical Validation**: Verify statistical properties
- **Business Rule Validation**: Check against business constraints
- **Range Validation**: Ensure values within expected ranges
- **Format Validation**: Verify correct data formats

### Consistency
- **Cross-Field Validation**: Check logical consistency between fields
- **Temporal Consistency**: Validate time-based consistency
- **Referential Integrity**: Check relationship consistency
- **Format Consistency**: Ensure consistent formatting

### Timeliness
- **Data Currency**: Assess how current the data is
- **Update Frequency**: Evaluate data refresh rates
- **Latency Analysis**: Measure data processing delays
- **Freshness Metrics**: Track data age and relevance

### Uniqueness
- **Duplicate Detection**: Identify and eliminate duplicate records
- **Primary Key Validation**: Verify unique identifiers
- **Record Uniqueness**: Assess overall uniqueness
- **Relationship Uniqueness**: Check unique relationships

### Validity
- **Data Type Validation**: Verify correct data types
- **Domain Validation**: Check against allowed value domains
- **Pattern Validation**: Validate against expected patterns
- **Constraint Validation**: Check database and business constraints

## Expected Output

### Quality Reports
- `quality_reports/$1_quality_check.json` - Quality assessment results
- `quality_reports/$1_data_profile.json` - Comprehensive data profile
- `quality_reports/$1_validation_report.md` - Detailed validation report
- `quality_reports/$1_monitoring_config.json` - Monitoring configuration

### Quality Metrics
- **Overall Quality Score**: Composite quality metric (0-100)
- **Dimension Scores**: Individual quality dimension scores
- **Issue Counts**: Number and severity of quality issues
- **Improvement Metrics**: Quality improvement tracking

### Data Outputs
- **Cleaned Data**: Quality-improved dataset versions
- **Validation Logs**: Detailed validation results
- **Error Reports**: Specific error descriptions and locations
- **Recommendations**: Actionable improvement suggestions

## Working Process

### 1. Data Loading and Profiling
```python
import pandas as pd
import numpy as np
from scipy import stats

def load_and_profile_data(dataset_path):
    """Load dataset and create initial profile"""
    data = pd.read_csv(dataset_path)

    profile = {
        'basic_info': {
            'shape': data.shape,
            'columns': list(data.columns),
            'data_types': data.dtypes.to_dict(),
            'memory_usage': data.memory_usage(deep=True).sum()
        },
        'quality_metrics': {
            'completeness': calculate_completeness(data),
            'uniqueness': calculate_uniqueness(data),
            'consistency': calculate_consistency(data)
        }
    }

    return data, profile
```

### 2. Quality Assessment
```python
def comprehensive_quality_assessment(data):
    """Perform comprehensive data quality assessment"""
    assessment = {
        'completeness': assess_completeness(data),
        'accuracy': assess_accuracy(data),
        'consistency': assess_consistency(data),
        'timeliness': assess_timeliness(data),
        'uniqueness': assess_uniqueness(data),
        'validity': assess_validity(data)
    }

    # Calculate overall quality score
    dimension_scores = [assessment[dim]['score'] for dim in assessment]
    overall_score = np.mean(dimension_scores)

    assessment['overall_score'] = overall_score
    assessment['quality_grade'] = assign_quality_grade(overall_score)

    return assessment
```

### 3. Data Cleaning
```python
def clean_data(data, quality_issues):
    """Clean data based on identified q
code-generatorSubagent

Expert code generation specialist for creating high-quality, production-ready analysis code in multiple programming languages. Use proactively for any code generation task requiring clean, efficient, and maintainable code for data analysis, machine learning, and visualization.

data-explorerSubagent

Advanced data exploration and analysis specialist for statistical analysis, pattern discovery, machine learning insights, and actionable business intelligence. Use proactively for any data analysis task requiring deep insights and comprehensive understanding.

hypothesis-generatorSubagent

Research hypothesis generation specialist for creating testable hypotheses, experimental designs, and research methodologies. Use proactively when data analysis suggests deeper investigation or when planning new research initiatives.

quality-assuranceSubagent

Data quality and validation specialist ensuring data integrity, analysis accuracy, and result reliability. Use proactively for any data validation, quality checks, or result verification tasks.

report-writerSubagent

Expert report writer specializing in comprehensive data analysis documentation, executive summaries, and technical documentation. Use proactively to create polished, professional reports.

visualization-specialistSubagent

Expert data visualization specialist for creating interactive, insightful, and publication-quality visualizations with advanced analytics integration and storytelling capabilities. Use proactively when data analysis would benefit from visual representation or when communicating complex insights to stakeholders.

analyzeSlash Command

Perform comprehensive data analysis on specified dataset

do-allSlash Command

自动化完成整个数据分析工作流程,从数据质量检查到最终报告生成