Skip to main content
ClaudeWave
Skill390 repo starsupdated today

ci-failure-investigator

The ci-failure-investigator skill triages CI pipeline failures by examining GitHub Actions logs and SHAFT Allure test artifacts to distinguish genuine code defects from infrastructure, provider, or configuration issues. Use it when investigating failed test runs or jobs to isolate root causes, classify failure types, and recommend targeted fixes backed by log evidence and local reproduction validation.

Install in Claude Code
Copy
git clone --depth 1 https://github.com/ShaftHQ/SHAFT_ENGINE /tmp/ci-failure-investigator && cp -r /tmp/ci-failure-investigator/.github/skills/ci-failure-investigator ~/.claude/skills/ci-failure-investigator
Then start a new Claude Code session; the skill loads automatically.

SKILL.md

# CI Failure Investigator

Use this playbook only for CI failure triage.

## Workflow

1. Identify the exact run, failed jobs, changed commit, and failed steps. Use
   local logs supplied by the user before making GitHub calls.
2. If remote data is required, normalize authentication with
   `scripts/ci/github-auth-env.sh`, then fetch failed-job logs and relevant
   artifact metadata only. Do not download successful-job logs by default.
3. Search logs for the first actionable exception and its surrounding setup or
   teardown lines. Separate primary failure from reporting and cleanup noise.
4. For SHAFT artifacts, follow
   [the CI investigation runbook](../../../docs/CI_FAILURE_INVESTIGATION.md).
   Parse self-contained Allure HTML or result JSON instead of opening large
   reports manually.
5. Count result files before trusting statuses. Inspect failed, broken, and
   retried/skipped attempts when the final summary hides an intermittent
   failure.
6. Classify the cause as code, test isolation, configuration, dependency,
   runner/infrastructure, credentials, or external provider. Do not weaken
   assertions to hide provider or credential failures.
7. Inspect only the source and recent changes connected to the failure
   signature. Propose or implement the smallest root-cause fix.
8. Validate with the narrowest local reproduction. Request a remote rerun only
   when local evidence cannot prove the environment-specific behavior.

## Output

Report the run/job, concise signature, root cause and confidence, affected
files, validation evidence, and remaining environment risk. Avoid templated
tables when a short finding is clearer.