Skill91 repo starsupdated 3mo ago

debug-phase

This Claude Code skill provides systematic debugging techniques including error classification, root cause analysis through the 5 Whys method, reproduction strategies, and error documentation using structured workflows. Use it when debugging errors, investigating failures, analyzing stack traces, fixing bugs, or documenting issues in error-log.md files within development projects.

View source Repository: Spec-Flow

Install in Claude Code

Copy

git clone --depth 1 https://github.com/marcusgoll/Spec-Flow /tmp/debug-phase && cp -r /tmp/debug-phase/.claude/skills/debug-phase ~/.claude/skills/debug-phase

Then start a new Claude Code session; the skill loads automatically.

Definition

SKILL.md

<objective>
Provide systematic debugging techniques for error investigation, root cause analysis, and fix implementation. Ensures errors are properly classified, root causes identified (not symptoms), and solutions documented to prevent recurrence.
</objective>

<quick_start>
Debug errors systematically:

1. Reproduce error consistently (100% reliable reproduction)
2. Classify by type (syntax/runtime/logic/integration/performance) and severity (critical/high/medium/low)
3. Isolate root cause using binary search, logging, breakpoints, or 5 Whys
4. Implement fix with failing test first (TDD)
5. Add regression tests to prevent recurrence
6. Document in error-log.md with ERR-XXXX ID

**Key principle**: Fix root causes, not symptoms. Use 5 Whys to drill down.
</quick_start>

<prerequisites>
Before debugging:
- Error can be reproduced (even if intermittent, identify conditions)
- Logs available (application logs, stack traces, network logs)
- Development environment set up for testing

Check error-log.md for similar historical errors before starting.
</prerequisites>

<workflow>
<step number="1">
**Search for similar errors**

Check error-log.md for past occurrences:

```bash
# Search by error message
grep -i "connection timeout" error-log.md

# Search by component
grep "Component: StudentProgressService" error-log.md

# Search by error type
grep "Type: Integration" error-log.md
```

If similar error found:

- Review previous fix (workaround or root cause?)
- Check if error recurred (>2 occurrences = need permanent fix)
- Note patterns (same component, same conditions, timing)

**If recurring error**: Prioritize permanent fix over workaround.
</step>

<step number="2">
**Reproduce error consistently**

Create minimal reproduction:

1. Gather context: error message, stack trace, timestamp, user actions, environment
2. Create minimal test case that triggers error 100% of time
3. If intermittent: identify conditions (timing, data state, race conditions)

Example reproduction:

```bash
# API error
curl -X POST http://localhost:3000/api/endpoint \
  -H "Content-Type: application/json" \
  -d '{"field": "value"}'

# Frontend error
# 1. Navigate to /dashboard
# 2. Click "Load Data" button
# 3. Error appears in console
```

**Validation**: Can trigger error reliably before proceeding.
</step>

<step number="3">
**Classify error**

**By type**:

- **Syntax**: Code doesn't compile/parse (typos, missing brackets, linting errors)
- **Runtime**: Code runs but crashes (null pointer, type error, uncaught exception)
- **Logic**: Code runs but wrong result (calculation error, wrong branch taken)
- **Integration**: External dependency fails (API timeout, database connection, service unavailable)
- **Performance**: Code works but too slow (timeout, memory leak, N+1 queries)

**By severity**:

- **Critical**: Data loss, security breach, total system failure
- **High**: Feature broken, blocks users from core functionality
- **Medium**: Feature degraded, workaround exists
- **Low**: Minor UX issue, cosmetic, no functional impact

Example classification:

```markdown
Type: Integration (API call to external service fails)
Severity: High (dashboard doesn't load, blocks teachers)
Component: StudentProgressService.fetchExternalData()
Frequency: 30% of requests (intermittent)
```

See references/error-classification.md for detailed matrix.
</step>

<step number="4">
**Isolate root cause**

Use systematic techniques:

**Binary search** (for large codebases):

- Add logging at midpoint of suspected code
- If error before midpoint → investigate first half
- If error after midpoint → investigate second half
- Repeat until narrowed to specific function

**Increase logging**:

```python
# Add debug logs around suspected area
logger.debug(f"Before API call: student_id={student_id}, params={params}")
response = api.fetch_data(student_id)
logger.debug(f"After API call: status={response.status}, data_len={len(response.data)}")
```

**Use breakpoints** (interactive debugging):

```bash
# Python
python -m pdb script.py

# Node.js
node inspect server.js

# Browser (Chrome DevTools)
# Set breakpoints, inspect variables, step through code
```

**Check assumptions**:

- Is variable the value you expect? (log it, don't assume)
- Is function being called when you expect? (add entry/exit logs)
- Are external dependencies available? (health check endpoints)

**Apply 5 Whys** (drill down to root cause):

```markdown
1. Why did dashboard fail to load?
   → API call to external service timed out

2. Why did API call timeout?
   → Request took >30 seconds (timeout limit)

3. Why did request take >30 seconds?
   → External service response time was 45 seconds

4. Why was external service so slow?
   → Requesting too much data (1000 records instead of 10)

5. Why requesting 1000 records?
   → Missing pagination parameter in API call

Root cause: Missing pagination parameter causes over-fetching
```

**Validation**: Root cause identified (not just symptoms). Can explain why error occurred.
</step>

<step number="5">
**Implement fix (TDD approach)**

1. Write failing test first:

```python
def test_fetch_external_data_with_pagination():
    """Test that API call includes pagination parameter"""
    service = StudentProgressService()
    result = service.fetchExternalData(student_id=123)

    assert len(result) <= 10  # Max 10 records per page
    assert result.has_next_page  # Pagination metadata present
```

2. Implement fix:

```python
# Before (broken)
def fetchExternalData(self, student_id):
    return api.get(f"/students/{student_id}/data")

# After (fixed)
def fetchExternalData(self, student_id, page_size=10):
    return api.get(
        f"/students/{student_id}/data",
        params={"page_size": page_size}
    )
```

3. Verify fix:

```bash
# Run test (should pass)
pytest tests/test_student_progress_service.py

# Manually test reproduction steps (error should not occur)
```

**Validation**: Fix addresses root cause, all tests pass, error n