Skip to main content
ClaudeWave
Skill506 estrellas del repoactualizado 2d ago

systematic-debugging

Systematic-debugging is a structured troubleshooting workflow for isolating root causes before attempting code fixes. Use it when encountering bugs, test failures, unexpected behavior, or integration issues, especially when initial solutions have failed multiple times or involve shared code, cross-module dependencies, or fallback chains. The skill progresses through four phases: root cause investigation, triage of fix impact, hypothesis testing, and minimal repair with architecture review.

Instalar en Claude Code
Copiar
git clone --depth 1 https://github.com/GanyuanRan/Aegis /tmp/systematic-debugging && cp -r /tmp/systematic-debugging/skills/systematic-debugging ~/.claude/skills/systematic-debugging
Después abre una sesión nueva de Claude Code; el skill carga automáticamente.

SKILL.md

# Execute

→ Bug? Test failure? Unexpected behavior? → **Find root cause first. No fixes without evidence.**
  1. Isolate: read error → reproduce → check git diff → drill upward through diagnostic layers:
     L1 symptom → L2 logic → L3 system → L4 architecture →
     L5 cross-system contract → L6 platform constraint → L7 spec gap.
     Stop when no deeper "why" remains OR terminal unactionable (T1-T4).
  2. Identify owner: compare with working code → locate canonical owner → flag duplicate owners as a finding
  3. Before fixing, run Patch-Shape Triage and Ripple Signal Triage if the candidate fix touches shared/core/cross-module behavior, contract, source-of-truth, fallback, adapter, duplicate owner, producer+consumer, or consumer-side patching. Also run Pre-Edit Complexity Check when the candidate fix touches an overloaded owner or may worsen source complexity.
  4. Prove: one hypothesis → minimal test → iterate. 3+ failed fixes = question architecture, do not attempt another code fix.
     After fix, if any symptom persists → differential diagnosis (Phase 4 Step 4bis).
  5. Fix: failing test → minimal code at canonical owner → verify → Reflection + architecture review → repair + retirement track
→ Done when: confidence ≥ B, both tracks explicit, DeeperCause answered "no" with evidence, no H-class hard signal still active.

# Systematic Debugging

## Overview

Random fixes waste time and create new bugs. Symptom fixes are failure.

This skill is the canonical debugging workflow. Use it to move from symptom to
root cause, then to the smallest sufficient stable repair and retirement plan.
Smallest repair means correct owner + bug class fixed + bounded entropy, not the smallest textual diff.

## When to Use

Any technical issue: test failures, bugs, unexpected behavior, performance problems, build/integration failures.

Especially under time pressure, when "just one quick fix" seems obvious, after multiple failed fixes, or when duplicate owners / fallback chains may be involved.

## Quick bug lane

For low-risk, single-owner bugs, keep the report compact: `Symptom`,
`Reproduction`, `Root Cause`, `Fix Boundary`, and `Verification`. Still collect
root-cause evidence before editing. If fallback, duplicate owner, consumer-side
patching, contract risk, shared logic, or cross-module behavior appears,
escalate to the full workflow.

## The Four Phases

### Phase 1: Root Cause Investigation

**BEFORE attempting ANY fix:**

1. **Read Error Messages Carefully**
   - Don't skip past errors or warnings — they often contain the exact solution
   - Read stack traces completely; note line numbers, file paths, error codes

2. **Reproduce Consistently**
   - Can you trigger it reliably? What are the exact steps? Does it happen every time?
   - If not reproducible → consult `feedback-loop-construction.md` to build an automated reproduction loop; don't guess
   - Record baseline: inputs, environment, version, logs, success/failure criteria

3. **Check Recent Changes**
   - What changed that could cause this? Git diff, recent commits, new dependencies, config changes, environmental differences

4. **Gather Evidence in Multi-Component Systems**
   - Instrument each component boundary: log what enters and exits
   - Run once to see where data breaks, then focus investigation there

5. **Trace Data Flow** (when error is deep in call stack)
   - Where does bad value originate? What called this with bad value?
   - Keep tracing up until you find the source. Fix at source, not at symptom.
   - For the complete backward tracing technique, see `root-cause-tracing.md`.

6. **Drill Upward Through Diagnostic Layers**

   Start at L1. Exhaust all "why" questions at each layer before moving upward.
   The chain is open-ended — architecture is not the endpoint.

   ```
   L1 Symptom:     what failed? where? exact reproduction?
   L2 Logic:       which branch, invariant, or state transition is wrong?
   L3 System:      which component boundary, dependency, or ownership seam?
   L4 Architecture: what design choice, duplicated owner, or fallback chain?
   L5 Cross-system: which API / SLA / timing contract between systems?
   L6 Platform:    what runtime / OS / framework constraint?
   L7 Spec gap:    who never defined correct behavior for this case?
   ```

   Hard signal definitions (H/T/D) are in the Quality Gate — apply them there,
   not during initial investigation.

   When the stop layer is not obvious, the user asks where the diagnosis
   stops, the issue crosses component/system boundaries, or a user-provided
   fact falsifies the current layer, expose a compact `Layer Stop Card` before
   fixing:

   ```text
   Layer Stop Card:
   - Current Stop Layer: L1 Symptom | L2 Logic | L3 System | L4 Architecture | L5 Cross-system Contract | L6 Platform | L7 Spec Gap | T-class boundary
   - Checked Path:
   - Evidence For Stop:
   - Excluded Layers:
   - Falsifier:
   - User Intervention Point:
   - Next Action:
   ```

   The card is an advisory readback of the diagnostic stop point. It is not a
   `GateDecision`, `PolicySnapshot`, or completion authority.

7. **Patch-Shape Triage Before Editing**

   Treat the first obvious fix as evidence, not clearance to edit. If the
   candidate fix shape matches any item below, continue upward before changing
   code unless you can prove the local layer is the canonical owner:

   - keyword, phrase, regex, negation-word list, or sample-text exception
   - local guard, extra conditional, `try`/`catch`, early return, or one-off branch
   - fallback, adapter, compatibility branch, prompt branch, or legacy path expansion
   - consumer/caller/readiness/presentation-layer patch
   - downstream re-parsing of raw text when typed intent, normalized state,
     contract, or another source-of-truth already exists
   - artifact/download/export/readback/cache patch that does not first locate
     the producer and source-of-truth owner
   - duplicate parsing, duplicate owner, or "keep both for now