Skip to main content
ClaudeWave
Skill545 estrellas del repoactualizado 9d ago

devops/monitoring-alerting

This DevOps monitoring and alerting skill configures system surveillance with predefined thresholds for application metrics like error rates and response times, plus infrastructure metrics including CPU, memory, and disk usage. Use it to establish escalating alert rules across four severity levels with corresponding notification channels, implement centralized log collection with retention policies, and ensure system stability through automated incident response triggers.

Instalar en Claude Code
Copiar
git clone --depth 1 https://github.com/echoVic/boss-skill /tmp/devops-monitoring-alerting && cp -r /tmp/devops-monitoring-alerting/skill/skills/devops/monitoring-alerting ~/.claude/skills/devops-monitoring-alerting
Después abre una sesión nueva de Claude Code; el skill carga automáticamente.

SKILL.md

# 监控告警

## 监控指标

| 类型 | 指标 | 阈值 |
|------|------|------|
| 应用 | 错误率 | < 1% |
| 应用 | 响应时间 | P99 < 500ms |
| 系统 | CPU使用率 | < 80% |
| 系统 | 内存使用率 | < 85% |
| 系统 | 磁盘使用率 | < 90% |

## 告警规则

| 级别 | 条件 | 通知方式 |
|------|------|----------|
| P0 | 服务不可用 | 电话 + 短信 + 邮件 |
| P1 | 错误率 > 5% | 短信 + 邮件 |
| P2 | 响应时间 > 1s | 邮件 |
| P3 | 资源使用率 > 80% | 邮件 |

## 日志管理

- 集中式日志收集
- 日志分级(ERROR, WARN, INFO, DEBUG)
- 日志保留策略(30天)
- 敏感信息脱敏