debug-phase
Systematic debugging techniques including error classification, root cause analysis (5 Whys), reproduction strategies, and error documentation. Use when debugging errors, investigating failures, analyzing stack traces, fixing bugs, or documenting errors in error-log.md. (project)
git clone --depth 1 https://github.com/marcusgoll/Spec-Flow /tmp/debug-phase && cp -r /tmp/debug-phase/.claude/skills/debug-phase ~/.claude/skills/debug-phaseSKILL.md
<objective>
Provide systematic debugging techniques for error investigation, root cause analysis, and fix implementation. Ensures errors are properly classified, root causes identified (not symptoms), and solutions documented to prevent recurrence.
</objective>
<quick_start>
Debug errors systematically:
1. Reproduce error consistently (100% reliable reproduction)
2. Classify by type (syntax/runtime/logic/integration/performance) and severity (critical/high/medium/low)
3. Isolate root cause using binary search, logging, breakpoints, or 5 Whys
4. Implement fix with failing test first (TDD)
5. Add regression tests to prevent recurrence
6. Document in error-log.md with ERR-XXXX ID
**Key principle**: Fix root causes, not symptoms. Use 5 Whys to drill down.
</quick_start>
<prerequisites>
Before debugging:
- Error can be reproduced (even if intermittent, identify conditions)
- Logs available (application logs, stack traces, network logs)
- Development environment set up for testing
Check error-log.md for similar historical errors before starting.
</prerequisites>
<workflow>
<step number="1">
**Search for similar errors**
Check error-log.md for past occurrences:
```bash
# Search by error message
grep -i "connection timeout" error-log.md
# Search by component
grep "Component: StudentProgressService" error-log.md
# Search by error type
grep "Type: Integration" error-log.md
```
If similar error found:
- Review previous fix (workaround or root cause?)
- Check if error recurred (>2 occurrences = need permanent fix)
- Note patterns (same component, same conditions, timing)
**If recurring error**: Prioritize permanent fix over workaround.
</step>
<step number="2">
**Reproduce error consistently**
Create minimal reproduction:
1. Gather context: error message, stack trace, timestamp, user actions, environment
2. Create minimal test case that triggers error 100% of time
3. If intermittent: identify conditions (timing, data state, race conditions)
Example reproduction:
```bash
# API error
curl -X POST http://localhost:3000/api/endpoint \
-H "Content-Type: application/json" \
-d '{"field": "value"}'
# Frontend error
# 1. Navigate to /dashboard
# 2. Click "Load Data" button
# 3. Error appears in console
```
**Validation**: Can trigger error reliably before proceeding.
</step>
<step number="3">
**Classify error**
**By type**:
- **Syntax**: Code doesn't compile/parse (typos, missing brackets, linting errors)
- **Runtime**: Code runs but crashes (null pointer, type error, uncaught exception)
- **Logic**: Code runs but wrong result (calculation error, wrong branch taken)
- **Integration**: External dependency fails (API timeout, database connection, service unavailable)
- **Performance**: Code works but too slow (timeout, memory leak, N+1 queries)
**By severity**:
- **Critical**: Data loss, security breach, total system failure
- **High**: Feature broken, blocks users from core functionality
- **Medium**: Feature degraded, workaround exists
- **Low**: Minor UX issue, cosmetic, no functional impact
Example classification:
```markdown
Type: Integration (API call to external service fails)
Severity: High (dashboard doesn't load, blocks teachers)
Component: StudentProgressService.fetchExternalData()
Frequency: 30% of requests (intermittent)
```
See references/error-classification.md for detailed matrix.
</step>
<step number="4">
**Isolate root cause**
Use systematic techniques:
**Binary search** (for large codebases):
- Add logging at midpoint of suspected code
- If error before midpoint → investigate first half
- If error after midpoint → investigate second half
- Repeat until narrowed to specific function
**Increase logging**:
```python
# Add debug logs around suspected area
logger.debug(f"Before API call: student_id={student_id}, params={params}")
response = api.fetch_data(student_id)
logger.debug(f"After API call: status={response.status}, data_len={len(response.data)}")
```
**Use breakpoints** (interactive debugging):
```bash
# Python
python -m pdb script.py
# Node.js
node inspect server.js
# Browser (Chrome DevTools)
# Set breakpoints, inspect variables, step through code
```
**Check assumptions**:
- Is variable the value you expect? (log it, don't assume)
- Is function being called when you expect? (add entry/exit logs)
- Are external dependencies available? (health check endpoints)
**Apply 5 Whys** (drill down to root cause):
```markdown
1. Why did dashboard fail to load?
→ API call to external service timed out
2. Why did API call timeout?
→ Request took >30 seconds (timeout limit)
3. Why did request take >30 seconds?
→ External service response time was 45 seconds
4. Why was external service so slow?
→ Requesting too much data (1000 records instead of 10)
5. Why requesting 1000 records?
→ Missing pagination parameter in API call
Root cause: Missing pagination parameter causes over-fetching
```
**Validation**: Root cause identified (not just symptoms). Can explain why error occurred.
</step>
<step number="5">
**Implement fix (TDD approach)**
1. Write failing test first:
```python
def test_fetch_external_data_with_pagination():
"""Test that API call includes pagination parameter"""
service = StudentProgressService()
result = service.fetchExternalData(student_id=123)
assert len(result) <= 10 # Max 10 records per page
assert result.has_next_page # Pagination metadata present
```
2. Implement fix:
```python
# Before (broken)
def fetchExternalData(self, student_id):
return api.get(f"/students/{student_id}/data")
# After (fixed)
def fetchExternalData(self, student_id, page_size=10):
return api.get(
f"/students/{student_id}/data",
params={"page_size": page_size}
)
```
3. Verify fix:
```bash
# Run test (should pass)
pytest tests/test_student_progress_service.py
# Manually test reproduction steps (error should not occur)
```
**Validation**: Fix addresses root cause, all tests pass, error nExecute multiple sprints in parallel based on dependency graph from sprint-plan.md
Build and validate locally for projects without remote deployment (prototypes, experiments, local-only dev)
Execute multi-sprint epic workflow from interactive scoping through deployment with parallel sprint execution and self-improvement
Execute feature development workflow from specification through production deployment with automated quality gates
Analyze workflow state and provide context-aware guidance with visual progress indicators and recommended next steps
Initialize project documentation, preferences, or design tokens
Implement small bug fixes and features (<100 LOC) without full workflow. Use for single-file changes, bug fixes, refactors, and minor enhancements that can be completed in under 30 minutes.
Enter deep craftsman mode - question everything, plan like Da Vinci, craft insanely great solutions, then materialize to roadmap