systematic-debugging
Systematic-debugging is a structured troubleshooting workflow for isolating root causes before attempting code fixes. Use it when encountering bugs, test failures, unexpected behavior, or integration issues, especially when initial solutions have failed multiple times or involve shared code, cross-module dependencies, or fallback chains. The skill progresses through four phases: root cause investigation, triage of fix impact, hypothesis testing, and minimal repair with architecture review.
git clone --depth 1 https://github.com/GanyuanRan/Aegis /tmp/systematic-debugging && cp -r /tmp/systematic-debugging/skills/systematic-debugging ~/.claude/skills/systematic-debuggingSKILL.md
# Execute
→ Bug? Test failure? Unexpected behavior? → **Find root cause first. No fixes without evidence.**
1. Isolate: read error → reproduce → check git diff → drill upward through diagnostic layers:
L1 symptom → L2 logic → L3 system → L4 architecture →
L5 cross-system contract → L6 platform constraint → L7 spec gap.
Stop when no deeper "why" remains OR terminal unactionable (T1-T4).
2. Identify owner: compare with working code → locate canonical owner → flag duplicate owners as a finding
3. Before fixing, run Patch-Shape Triage and Ripple Signal Triage if the candidate fix touches shared/core/cross-module behavior, contract, source-of-truth, fallback, adapter, duplicate owner, producer+consumer, or consumer-side patching. Also run Pre-Edit Complexity Check when the candidate fix touches an overloaded owner or may worsen source complexity.
4. Prove: one hypothesis → minimal test → iterate. 3+ failed fixes = question architecture, do not attempt another code fix.
After fix, if any symptom persists → differential diagnosis (Phase 4 Step 4bis).
5. Fix: failing test → minimal code at canonical owner → verify → Reflection + architecture review → repair + retirement track
→ Done when: confidence ≥ B, both tracks explicit, DeeperCause answered "no" with evidence, no H-class hard signal still active.
# Systematic Debugging
## Overview
Random fixes waste time and create new bugs. Symptom fixes are failure.
This skill is the canonical debugging workflow. Use it to move from symptom to
root cause, then to the smallest sufficient stable repair and retirement plan.
Smallest repair means correct owner + bug class fixed + bounded entropy, not the smallest textual diff.
## When to Use
Any technical issue: test failures, bugs, unexpected behavior, performance problems, build/integration failures.
Especially under time pressure, when "just one quick fix" seems obvious, after multiple failed fixes, or when duplicate owners / fallback chains may be involved.
## Quick bug lane
For low-risk, single-owner bugs, keep the report compact: `Symptom`,
`Reproduction`, `Root Cause`, `Fix Boundary`, and `Verification`. Still collect
root-cause evidence before editing. If fallback, duplicate owner, consumer-side
patching, contract risk, shared logic, or cross-module behavior appears,
escalate to the full workflow.
## The Four Phases
### Phase 1: Root Cause Investigation
**BEFORE attempting ANY fix:**
1. **Read Error Messages Carefully**
- Don't skip past errors or warnings — they often contain the exact solution
- Read stack traces completely; note line numbers, file paths, error codes
2. **Reproduce Consistently**
- Can you trigger it reliably? What are the exact steps? Does it happen every time?
- If not reproducible → consult `feedback-loop-construction.md` to build an automated reproduction loop; don't guess
- Record baseline: inputs, environment, version, logs, success/failure criteria
3. **Check Recent Changes**
- What changed that could cause this? Git diff, recent commits, new dependencies, config changes, environmental differences
4. **Gather Evidence in Multi-Component Systems**
- Instrument each component boundary: log what enters and exits
- Run once to see where data breaks, then focus investigation there
5. **Trace Data Flow** (when error is deep in call stack)
- Where does bad value originate? What called this with bad value?
- Keep tracing up until you find the source. Fix at source, not at symptom.
- For the complete backward tracing technique, see `root-cause-tracing.md`.
6. **Drill Upward Through Diagnostic Layers**
Start at L1. Exhaust all "why" questions at each layer before moving upward.
The chain is open-ended — architecture is not the endpoint.
```
L1 Symptom: what failed? where? exact reproduction?
L2 Logic: which branch, invariant, or state transition is wrong?
L3 System: which component boundary, dependency, or ownership seam?
L4 Architecture: what design choice, duplicated owner, or fallback chain?
L5 Cross-system: which API / SLA / timing contract between systems?
L6 Platform: what runtime / OS / framework constraint?
L7 Spec gap: who never defined correct behavior for this case?
```
Hard signal definitions (H/T/D) are in the Quality Gate — apply them there,
not during initial investigation.
When the stop layer is not obvious, the user asks where the diagnosis
stops, the issue crosses component/system boundaries, or a user-provided
fact falsifies the current layer, expose a compact `Layer Stop Card` before
fixing:
```text
Layer Stop Card:
- Current Stop Layer: L1 Symptom | L2 Logic | L3 System | L4 Architecture | L5 Cross-system Contract | L6 Platform | L7 Spec Gap | T-class boundary
- Checked Path:
- Evidence For Stop:
- Excluded Layers:
- Falsifier:
- User Intervention Point:
- Next Action:
```
The card is an advisory readback of the diagnostic stop point. It is not a
`GateDecision`, `PolicySnapshot`, or completion authority.
7. **Patch-Shape Triage Before Editing**
Treat the first obvious fix as evidence, not clearance to edit. If the
candidate fix shape matches any item below, continue upward before changing
code unless you can prove the local layer is the canonical owner:
- keyword, phrase, regex, negation-word list, or sample-text exception
- local guard, extra conditional, `try`/`catch`, early return, or one-off branch
- fallback, adapter, compatibility branch, prompt branch, or legacy path expansion
- consumer/caller/readiness/presentation-layer patch
- downstream re-parsing of raw text when typed intent, normalized state,
contract, or another source-of-truth already exists
- artifact/download/export/readback/cache patch that does not first locate
the producer and source-of-truth owner
- duplicate parsing, duplicate owner, or "keep both for now|
Deprecated - use the aegis:brainstorming skill instead
Deprecated - use the aegis:executing-plans skill instead
Deprecated - use the aegis:writing-plans skill instead
Use when retiring old logic, collapsing duplicate owners, removing fallbacks, or touching schema, persistence, or source-of-truth boundaries while deciding whether to delete old paths, retain compatibility, or stop for confirmation.
Use when defining new features, product behavior, UI/component design, architecture choices, contract changes, or ambiguous medium/high-complexity work before implementation.
Use when the user asks for caveman mode, fewer tokens, brief responses, compressed communication, or otherwise explicitly requests a much shorter answer.
Use when facing 2+ independent tasks that can be worked on without shared state or sequential dependencies