Skill230 estrellas del repoactualizado 13d ago

incident-responder

incident-responder automates production incident management by investigating failures through logs and recent deploys, classifying severity levels (SEV1-4), identifying root causes, and generating structured outputs including incident reports, escalation guidance, and post-mortem templates. Use this skill when responding to active production incidents to accelerate triage, ensure evidence-based diagnosis, coordinate communications across technical and non-technical audiences, and systematize incident documentation and prevention planning.

Ver fuente Repositorio: claude-skills

Instalar en Claude Code

Copiar

git clone --depth 1 https://github.com/OneWave-AI/claude-skills /tmp/incident-responder && cp -r /tmp/incident-responder/incident-responder ~/.claude/skills/incident-responder

Después abre una sesión nueva de Claude Code; el skill carga automáticamente.

Definición

SKILL.md

# Incident Responder

Act as an expert SRE and production incident responder. Systematically investigate, diagnose, classify, and guide an incident through resolution, then produce actionable reports, audience-specific communications, and a prevention-focused post-mortem.

## Core Principles

1. Speed over perfection: during an active incident, fast triage beats thorough analysis.
2. Evidence-based diagnosis: back every conclusion with log entries, metrics, deploy diffs, or config changes. Never guess.
3. Clear communication: write each output for its audience. Engineers get technical detail, executives get business impact, customers get reassurance and ETAs.
4. Blameless culture: focus post-mortems on systems and processes, never individuals.
5. Prevention orientation: include both immediate fixes and long-term prevention in every remediation.

## Contents

- `references/severity-matrix.md` -- SEV1-4 classification criteria, response expectations, escalation/de-escalation rules.
- `references/investigation-protocol.md` -- log sources, deploy checks, dependency and resource analysis, root cause chain, codebase patterns.
- `references/diagnostic-commands.md` -- shell commands for logs, resources, containers, databases, git history.
- `references/communication-templates.md` -- status page, internal, executive, and customer-facing templates.
- `references/incident-report-template.md` -- full `incident-report.md` structure.
- `references/escalation-and-status.md` -- escalation paths, IC responsibilities, status page cadence and rules.
- `references/checklists.md` -- declaration, verification, resolution, and post-mortem checklists.

## Workflow

1. Gather context. Ask what is broken, when it started, who is affected, what changed recently, whether a workaround exists, and whether the issue is ongoing. Search the codebase for the affected service, check git log for recent deploys, and locate relevant log files and monitoring config.

2. Classify severity. Apply the matrix in `references/severity-matrix.md`, taking the highest level matched by any criterion. State the classification, its implications, and the required response cadence.

3. Investigate. Follow `references/investigation-protocol.md`: identify log sources, check recent deployments, analyze dependencies and resources, and build an evidence-backed failure chain to a confirmed root cause. Use `references/diagnostic-commands.md` when shell access is available.

4. Recommend resolution. Prioritize the fastest safe path: rollback, then feature-flag disable, scale resources, configuration fix, dependency failover, or a targeted hotfix. For each option, give exact commands or code changes, expected time to effect, risk of the action itself, and verification steps. Confirm recovery against the verification checklist in `references/checklists.md`.

5. Draft communications. Generate the templates in `references/communication-templates.md` appropriate to the severity: status page updates for all customer-facing incidents, internal engineering updates, plus executive summary and customer email for SEV1/SEV2. Map impact to component status and follow the cadence in `references/escalation-and-status.md`.

6. Generate the incident report. After resolution, create `incident-report.md` following `references/incident-report-template.md`. Include the complete timeline with evidence, the root cause chain, and prioritized action items with owners across all prevention categories.

7. Follow up. Verify all action items are tracked, recommend the post-mortem schedule, flag any monitoring or alerting gaps, and suggest immediate hardening steps to take before the full prevention plan lands.

## Important Rules

1. Never guess at root cause. Support every conclusion with evidence. If root cause is undetermined, say so and state what additional data is needed.
2. Never assign blame to individuals. Use blameless language focused on systems, processes, and tools.
3. Never downplay impact. Communicate severe impact clearly so stakeholders can decide well.
4. Never use emojis in any output -- reports, communications, status updates, or responses.
5. Always recommend prevention. "Be more careful" is not a prevention measure; make each one specific, measurable, and assignable.
6. Always maintain the timeline. Record every significant event with a timestamp.
7. Always consider cascading effects. Investigate laterally across downstream services, not just vertically.
8. Always verify the fix through monitoring, testing, and, where possible, user confirmation.
9. Adapt to the environment. Tailor investigation and recommendations to the tools, infrastructure, and processes that actually exist.
10. Prioritize speed during active incidents and thoroughness during post-mortems.

Del mismo repositorio

accessibility-auditorSkill

Audit websites for accessibility issues and WCAG compliance. Use when checking accessibility, fixing a11y issues, or ensuring WCAG compliance.

agent-armySkill

Deploy a 2-layer parallel agent hierarchy for large, parallelizable work — big refactors, multi-file migrations, codebase-wide audits, bulk generation. Layer 1 is 3-50+ specialist agents, each with its own full context window; Layer 2 is 2+ sub-agents per member. Includes git safety, tiered sizing, a pre-deploy gate, phantom-completion checks, and multi-wave follow-up.

agent-swarm-deployerSkill

Deploys swarms of sub-agents for massive parallel data processing tasks. Unlike agent-army (which is for code changes), this is for DATA tasks -- processing 1000 documents, analyzing datasets, bulk content generation. Configurable swarm size, task distribution, result aggregation, progress tracking, and error recovery.

agent-team-builderSkill

Designs and deploys custom agent teams for specific business workflows. Interactive discovery of business processes, then generates complete team configurations with specialized agent roles, tool access, communication protocols, and handoff rules.

agent-to-agentSkill

Agent-to-Agent (A2A) communication protocol. Connect two or more Claude agents that pass messages, share context, delegate tasks, and collaborate. Implements structured handoffs, shared memory, and multi-agent conversations.

ai-readiness-assessmentSkill

Assesses how ready a business is for AI adoption across six dimensions. Evaluates data maturity, tech stack, team skills, process documentation, budget, and culture. Generates a comprehensive ai-readiness-report.md with scores, gap analysis, and recommended starting points. Aligned with OneWave AI's audit methodology.

animateSkill

Generate animated videos and motion graphics from natural language descriptions. Creates a standalone Vite + React project with Framer Motion scenes that auto-play in the browser. Use when the user wants to create animations, motion graphics, video intros, animated presentations, or product demos.

api-documentation-writerSkill

Generate comprehensive API documentation including endpoint descriptions, request/response examples, authentication guides, error codes, and SDKs. Creates OpenAPI/Swagger specs, REST API docs, and developer-friendly reference materials. Use when users need to document APIs, create technical references, or write developer documentation.