Skip to main content
ClaudeWave
Skill145 estrellas del repoactualizado yesterday

AI Test Orchestration

AI-powered test orchestration skill covering intelligent test selection, risk-based test prioritization, flaky test management, test impact analysis, parallel execution optimization, and predictive test failure detection using machine learning.

Instalar en Claude Code
Copiar
git clone --depth 1 https://github.com/PramodDutta/qaskills /tmp/ai-test-orchestration && cp -r /tmp/ai-test-orchestration/seed-skills/ai-test-orchestration ~/.claude/skills/ai-test-orchestration
Después abre una sesión nueva de Claude Code; el skill carga automáticamente.

SKILL.md

# AI Test Orchestration Skill

You are an expert software engineer specializing in AI-powered test orchestration and intelligent test management. When the user asks you to implement, optimize, or debug test selection, prioritization, or parallel execution strategies, follow these detailed instructions.

## Core Principles

1. **Test the changed code first** -- Prioritize tests that cover recently modified files and functions.
2. **Learn from history** -- Use historical pass/fail data to predict which tests are likely to fail.
3. **Quarantine, don't ignore** -- Flaky tests should be isolated and tracked, not deleted or skipped.
4. **Optimize for feedback speed** -- Run the most likely-to-fail tests first so developers get fast signals.
5. **Distribute intelligently** -- Split test suites across parallel workers based on historical duration, not file count.
6. **Measure and iterate** -- Track metrics like time-to-first-failure, false positive rate, and test suite efficiency.
7. **Fail fast, verify thoroughly** -- Fast feedback on PR checks, comprehensive verification on merge.

## Project Structure

```
project/
  src/
    orchestrator/
      test-selector.ts
      risk-scorer.ts
      impact-analyzer.ts
      parallel-splitter.ts
      flaky-detector.ts
      prediction-model.ts
    data/
      test-history.ts
      git-analysis.ts
      coverage-map.ts
    reporters/
      orchestration-report.ts
      metrics-collector.ts
    config/
      orchestration.config.ts
  scripts/
    collect-test-data.ts
    train-model.py
    analyze-flakiness.ts
  tests/
    orchestrator/
      test-selector.test.ts
      risk-scorer.test.ts
      impact-analyzer.test.ts
```

## Intelligent Test Selection Based on Code Changes

```typescript
// src/orchestrator/test-selector.ts
import { execSync } from 'child_process';
import { CoverageMap } from '../data/coverage-map';

interface TestSelection {
  mustRun: string[];      // Tests directly covering changed code
  shouldRun: string[];    // Tests with transitive dependencies on changed code
  canSkip: string[];      // Tests with no relation to changes
  confidence: number;     // 0-1, how confident we are in the selection
}

interface ChangedFile {
  path: string;
  additions: number;
  deletions: number;
  changedFunctions: string[];
}

export class TestSelector {
  private coverageMap: CoverageMap;

  constructor(coverageMap: CoverageMap) {
    this.coverageMap = coverageMap;
  }

  async selectTests(baseBranch: string = 'main'): Promise<TestSelection> {
    const changedFiles = this.getChangedFiles(baseBranch);
    const allTests = this.coverageMap.getAllTests();
    const mustRun = new Set<string>();
    const shouldRun = new Set<string>();

    for (const file of changedFiles) {
      // Direct coverage: tests that execute lines in this file
      const directTests = this.coverageMap.getTestsCoveringFile(file.path);
      directTests.forEach((t) => mustRun.add(t));

      // Function-level precision: only tests covering changed functions
      if (file.changedFunctions.length > 0) {
        for (const fn of file.changedFunctions) {
          const fnTests = this.coverageMap.getTestsCoveringFunction(file.path, fn);
          fnTests.forEach((t) => mustRun.add(t));
        }
      }

      // Transitive dependencies: tests covering files that import changed file
      const dependents = this.coverageMap.getDependentsOf(file.path);
      for (const dep of dependents) {
        const depTests = this.coverageMap.getTestsCoveringFile(dep);
        depTests.forEach((t) => {
          if (!mustRun.has(t)) shouldRun.add(t);
        });
      }
    }

    const canSkip = allTests.filter(
      (t) => !mustRun.has(t) && !shouldRun.has(t)
    );

    const confidence = this.coverageMap.isComplete()
      ? 0.95
      : 0.7; // Lower confidence if coverage data is stale

    return {
      mustRun: [...mustRun],
      shouldRun: [...shouldRun],
      canSkip,
      confidence,
    };
  }

  private getChangedFiles(baseBranch: string): ChangedFile[] {
    const diffOutput = execSync(
      `git diff --name-only --diff-filter=ACMR ${baseBranch}...HEAD`,
      { encoding: 'utf-8' }
    ).trim();

    if (!diffOutput) return [];

    return diffOutput.split('\n').map((filePath) => {
      const stat = execSync(
        `git diff --numstat ${baseBranch}...HEAD -- "${filePath}"`,
        { encoding: 'utf-8' }
      ).trim();

      const [additions, deletions] = stat.split('\t').map(Number);

      // Extract changed function names from diff
      const diffContent = execSync(
        `git diff -U0 ${baseBranch}...HEAD -- "${filePath}"`,
        { encoding: 'utf-8' }
      );
      const changedFunctions = this.extractChangedFunctions(diffContent);

      return { path: filePath, additions, deletions, changedFunctions };
    });
  }

  private extractChangedFunctions(diffContent: string): string[] {
    const functionPattern = /^@@.*@@\s+(?:async\s+)?(?:function\s+)?(\w+)/gm;
    const functions: string[] = [];
    let match: RegExpExecArray | null;

    while ((match = functionPattern.exec(diffContent)) !== null) {
      functions.push(match[1]);
    }

    return [...new Set(functions)];
  }
}
```

## Risk-Based Test Prioritization

```typescript
// src/orchestrator/risk-scorer.ts
interface TestRiskScore {
  testId: string;
  score: number;          // 0-100, higher = run first
  factors: RiskFactor[];
}

interface RiskFactor {
  name: string;
  weight: number;
  value: number;
  contribution: number;
}

interface TestHistory {
  testId: string;
  recentResults: ('pass' | 'fail' | 'skip')[];
  averageDuration: number;
  lastFailedAt: Date | null;
  failureRate: number;     // 0-1
  flakinessScore: number;  // 0-1
}

export class RiskScorer {
  private weights = {
    recentFailureRate: 30,
    codeChangeProximity: 25,
    historicalFlakiness: 15,
    timeSinceLastRun: 10,
    testAge: 5,
    complexity: 10,
    criticalPath: 5,
  };

  scoreTests(
    tests: string[],
axe-core Accessibility AutomationSkill

Automated accessibility testing with axe-core integrated into CI pipelines, including custom rule configuration, issue prioritization, and remediation guidance.

A/B Test ValidationSkill

Validating A/B test implementations including traffic splitting accuracy, statistical significance calculation, metric tracking, and experiment cleanup.

Accessibility A11y EnhancedSkill

Comprehensive WCAG compliance and accessibility testing covering ARIA, keyboard navigation, screen readers, color contrast, and automated a11y validation.

Accessibility AuditorSkill

Comprehensive WCAG 2.1 AA compliance testing combining automated axe-core scans with manual keyboard navigation, screen reader compatibility, and focus management verification

AFL++ Fuzzing TestingSkill

American Fuzzy Lop Plus Plus mutation-based fuzz testing for finding crashes, hangs, and security vulnerabilities in binary programs.

Agent Browser AutomationSkill

Fast Rust-based headless browser automation CLI with Node.js fallback for AI agents, featuring navigation, clicking, typing, snapshots, and structured commands optimized for agent workflows.

Agentic Testing PatternsSkill

AI-first testing methodology where autonomous agents plan, generate, execute, and maintain test suites with minimal human intervention, covering agent orchestration, feedback loops, and intelligent test prioritization.

AI Agent EvaluationSkill

Comprehensive evaluation patterns for AI agents including multi-turn conversation testing, LLM-as-judge frameworks, benchmark suites, regression detection, and systematic eval pipelines for measuring agent quality and safety.