Skip to main content
ClaudeWave
Skill282 estrellas del repoactualizado 3mo ago

systematic-debugging

Systematic Debugging guides developers through root cause investigation before attempting fixes. Use this skill when tests fail, errors occur, features behave unexpectedly, or performance degrades, following a methodical process of distinguishing symptoms from root causes, tracing data flow, forming and testing debugging hypotheses, and verifying fixes prevent recurrence rather than applying surface-level solutions.

Instalar en Claude Code
Copiar
git clone --depth 1 https://github.com/MadAppGang/claude-code /tmp/systematic-debugging && cp -r /tmp/systematic-debugging/plugins/dev/skills/discipline/systematic-debugging ~/.claude/skills/systematic-debugging
Después abre una sesión nueva de Claude Code; el skill carga automáticamente.

SKILL.md

# Systematic Debugging

**Iron Law:** "NO FIXES WITHOUT ROOT CAUSE INVESTIGATION FIRST"

## When to Use

Use this skill when:
- A test fails and you need to understand why
- An error is thrown and you need to find the cause
- A feature behaves unexpectedly
- Performance degrades and you need to identify bottlenecks
- Data corruption occurs and you need to trace the source
- A bug reappears after "fixing" it

## Red Flags (Violation Indicators)

Detect these patterns that indicate skipping root cause investigation:

- [ ] **Fix without understanding** - "I'll just add a null check" (why is it null?)
- [ ] **Skip to solution** - "Let me try wrapping this in setTimeout" (why does timing matter?)
- [ ] **Restart tools** - "Let me restart the dev server" (what state is corrupted?)
- [ ] **Clear cache** - "Let me clear the cache" (what cache entry is stale?)
- [ ] **Change multiple things** - "Let me update these 3 files" (which one fixes it?)
- [ ] **Shouldn't cause problem** - "This change shouldn't affect that" (but it does, why?)
- [ ] **Assume cause** - "Must be a race condition" (what evidence supports this?)

## Key Concepts

### 1. Root Cause vs. Symptom

**Symptom:** What you observe (test fails, error thrown, wrong output)
**Root Cause:** Why it happens (null value, wrong condition, missing await)

**Example:**
```
Symptom: "TypeError: Cannot read property 'name' of undefined"
Root Cause: API returns null when user not found, but code expects object
```

**Bad approach:** Add `user?.name` (fixes symptom, not cause)
**Good approach:** Add validation `if (!user) throw new NotFoundError()` (fixes cause)

### 2. Data Flow Tracing

**Principle:** Follow data from source to error point

**Steps:**
1. Identify error location (stack trace line number)
2. Identify data involved (variable name, object property)
3. Trace backwards: Where does this data come from?
4. Find divergence: Where does actual differ from expected?

**Example:**
```
Error: "Expected 'active' but got 'inactive'"
Location: user.test.ts:42 - expect(user.status).toBe('active')
Data: user.status = 'inactive'
Trace: user.status ← updateUser() ← API response ← database
Divergence: Database has status='inactive' (expected 'active')
Root Cause: Test setup didn't create user with active status
```

### 3. Hypothesis-Driven Debugging

**Principle:** Form hypothesis, test with evidence, refine

**Process:**
1. **Observe:** What is the symptom? (error message, wrong output)
2. **Hypothesize:** What could cause this? (list 2-3 possibilities)
3. **Predict:** If hypothesis is true, what else should I see?
4. **Test:** Add logging, check state, run minimal reproduction
5. **Conclude:** Does evidence support hypothesis? If no, try next hypothesis

**Example:**
```
Symptom: API request times out after 30s
Hypothesis 1: Database query is slow
  Prediction: Should see long query time in logs
  Test: Add query timing logs
  Result: Queries complete in <100ms ✗ Hypothesis rejected

Hypothesis 2: Network connection is hanging
  Prediction: Should see connection delay, not query delay
  Test: Add request timing logs (connect time vs. query time)
  Result: Connection takes 31s, query never runs ✓ Hypothesis confirmed

Root Cause: Firewall blocks connection, causing timeout
```

### 4. Fix Verification

**Principle:** Verify fix addresses root cause, not just symptom

**Checklist:**
- [ ] Test that was failing now passes
- [ ] Test passes for the reason you expect (not coincidence)
- [ ] Test fails if you revert the fix (confirms fix is necessary)
- [ ] Related tests still pass (no regressions)
- [ ] Root cause is addressed in fix (not just symptom)

## 4-Phase Debugging Process

### Phase 1: TRACE DATA FLOW

**Objective:** Identify where actual diverges from expected

**Steps:**
1. Read error message (what failed?)
2. Read stack trace (where failed?)
3. Identify data involved (what value is wrong?)
4. Trace backwards from error to source
5. Log intermediate values to find divergence point

**Example (TypeScript):**
```typescript
// Error: "Expected user email, got undefined"
// Stack trace: user-service.ts:42

// Phase 1: Trace data flow
console.log('1. API response:', response);           // { data: { user: {...} } }
console.log('2. Extracted user:', response.data);     // { user: {...} }
console.log('3. User object:', response.data.user);   // { id: 1, name: 'Alice' }
console.log('4. Email field:', response.data.user.email); // undefined

// Divergence found: response.data.user has no email field
```

### Phase 2: IDENTIFY DIVERGENCE

**Objective:** Determine why actual differs from expected

**Questions:**
- What is the expected value? (from spec, test, documentation)
- What is the actual value? (from logs, debugger, state inspection)
- Where does the divergence occur? (which function, which line)
- What changed recently? (git diff, recent commits)

**Example (Python):**
```python
# Expected: parse_csv() returns list of dicts with 'email' key
# Actual: parse_csv() returns list of dicts without 'email' key

# Check input CSV file
with open('users.csv') as f:
    print(f.readline())  # id,name,phone  ← Missing 'email' column!

# Divergence: CSV file format changed, missing 'email' column
```

### Phase 3: HYPOTHESIZE ROOT CAUSE

**Objective:** Form testable hypothesis about why divergence occurred

**Hypothesis Template:**
```
"I believe [divergence] occurs because [root cause].
If this is true, I should see [evidence].
I can test this by [action]."
```

**Example (Go):**
```go
// Divergence: user.Email is empty string when fetched from cache

// Hypothesis 1: Cache serialization drops empty fields
// Evidence: Other empty fields (phone, address) also missing
// Test: Check cached JSON structure
// Result: {"id":1,"name":"Alice"} ← Empty fields missing ✓

// Root Cause: JSON serialization omits empty fields (omitempty tag)
```

### Phase 4: VERIFY FIX

**Objective:** Confirm fix addresses root cause

**Verification Steps:**
1. Write test that r