Skip to main content
ClaudeWave
Skill333 estrellas del repoactualizado today

backup-and-disaster-recovery

This Claude Code skill guides planning and execution of backup strategies and disaster recovery procedures. Use it when defining recovery point objectives (RPO) and recovery time objectives (RTO), designing backup architecture, determining what systems require backups and backup frequency, planning for major outages or data loss scenarios, conducting disaster recovery drills, or identifying gaps in recovery capabilities after an incident. It is not for active incident response or routine operational tasks like deploy rollbacks or snapshot reviews.

Instalar en Claude Code
Copiar
git clone --depth 1 https://github.com/rampstackco/claude-skills /tmp/backup-and-disaster-recovery && cp -r /tmp/backup-and-disaster-recovery/dist/pi/.agents/skills/backup-and-disaster-recovery ~/.claude/skills/backup-and-disaster-recovery
Después abre una sesión nueva de Claude Code; el skill carga automáticamente.

SKILL.md

# Backup and Disaster Recovery

Plan for the worst case: the database is gone, the host is down for a week, the deploy was poisoned, ransomware encrypted everything. The skill is in advance preparation, not reaction.

---

## When to use

- Setting up backups for a new system
- Reviewing and validating backup architecture
- Defining RPO (recovery point objective) and RTO (recovery time objective)
- Running a disaster recovery drill
- Diagnosing gaps after an incident
- Planning for ransomware, data corruption, or insider threats
- Migrating to a new platform (DR planning belongs in the migration plan)

## When NOT to use

- Active incident response (use `incident-response`)
- Routine deploy rollbacks (use `launch-runbook`)
- Code or content versioning (covered by Git, CMS revision history)
- Routine database snapshots (use this skill to set them up; routine review goes in monitoring)

---

## Required inputs

- The systems in scope (databases, file storage, code, configs, secrets)
- The hosting platforms and providers
- Existing backup tooling and what it covers
- Tolerance for data loss (in time)
- Tolerance for downtime (in time)
- Compliance requirements (some regulations mandate specific backup standards)

---

## The framework: 4 questions

Every disaster recovery plan answers four questions explicitly.

### Question 1: What needs to be recoverable?

List every system that holds state. Categorize by criticality.

**Tier 1: must recover.** Without it, the business stops. (Customer database, transaction log, primary content store.)

**Tier 2: should recover.** Loss is painful but not fatal. (Analytics, logs, secondary services.)

**Tier 3: nice to recover.** Easy to rebuild. (Caches, derived data, temporary state.)

The tier drives RPO, RTO, backup frequency, and storage spend.

### Question 2: How much data loss is acceptable? (RPO)

RPO is the maximum age of data that's acceptable to lose, measured in time.

- RPO = 1 hour: hourly backups or continuous replication needed
- RPO = 1 day: daily backups acceptable
- RPO = 1 week: weekly backups acceptable

For most production data, RPO of 1 hour or less is the target. For critical financial systems, near-zero RPO (continuous replication).

For derived or rebuildable data, RPO of 1 day or longer is fine.

### Question 3: How much downtime is acceptable? (RTO)

RTO is the maximum time to restore service after a disaster.

| RTO target | Implies |
|---|---|
| < 5 minutes | Hot standby with automatic failover |
| < 1 hour | Warm standby with manual failover or fast restore from recent snapshot |
| < 24 hours | Cold backup with documented restore process |
| Days to weeks | Best-effort, accept extended downtime |

RTO drives architecture spend. Aggressive RTOs (< 1 hour) are expensive. Loose RTOs (days) are cheap.

### Question 4: What's the disaster?

Plan for specific scenarios. Each has different implications.

**Hardware failure.** Disk dies. Standard backups solve this. Most modern hosts handle automatically.

**Provider outage.** Region or vendor goes down. Cross-region or cross-provider redundancy needed for low RTO.

**Data corruption.** Bad migration, bug, accidental delete. Point-in-time restore needed. The latest backup might be corrupted; you need history.

**Ransomware or compromise.** Attacker encrypts or deletes. Backups must be immutable or air-gapped, otherwise the attacker takes them too.

**Account compromise.** Attacker has admin credentials, deletes everything. Same defense as ransomware: immutable backups, separate access control.

**Vendor lock-out.** Account suspended, billing dispute, vendor disappears. Backups outside the vendor needed.

**Insider threat.** Disgruntled employee deletes or exfiltrates. Audit logs, separation of duties, immutable backups.

A backup strategy that handles only hardware failure isn't a strategy. It's the easiest case.

---

## Workflow

### Step 1: Inventory state

Every system that holds state goes on a list:

| System | Data type | Tier | Current backup | Tested? |
|---|---|---|---|---|

If you can't list it, you can't protect it. Often the inventory itself reveals gaps (the "we forgot about that database" moment).

### Step 2: Set RPO and RTO per tier

For each tier, agree on RPO and RTO. Get sign-off from the people who'd be impacted by a disaster.

Push back on aspirational targets that aren't backed by infrastructure spend. RTO of 5 minutes for a system without a hot standby is not real.

### Step 3: Verify or design backup architecture

For each system, ensure:

- **Frequency** matches RPO.
- **Retention** covers point-in-time recovery (typically 30+ days for production data).
- **Storage location** is separate from the source. Same disk, same account, same region: not enough.
- **Immutability or write-once storage** for at least some backup copies. Defends against ransomware.
- **Encryption at rest.** Standard for compliance.
- **Tested restore procedure.** Untested backups are not backups.

The "3-2-1 rule" is a useful starting point: 3 copies of data, 2 different storage types, 1 offsite (or off-account, off-platform).

### Step 4: Document the restore runbook

For each system, write the runbook:

1. How to detect the disaster (cross-reference monitoring)
2. How to decide to restore (decision criteria, who authorizes)
3. The exact restore steps (commands, screenshots, sequence)
4. How to verify the restore worked
5. How to switch traffic back
6. Communication template (status page, customer notice)

The runbook is for the worst night of someone's career. Write it for tired, panicked you.

### Step 5: Run a drill

The first restore should never be during a real disaster.

Drills can be:

- **Tabletop:** walk through the runbook on paper. Useful for finding gaps in the plan.
- **Partial:** restore to a non-production environment. Verify the data, validate the steps.
- **Full:** simulate the disaster. Production failover or full restore. Maximum confidence, maximum risk.

For mos
accessibility-auditSkill

Run a comprehensive WCAG accessibility audit covering perceivable, operable, understandable, and robust principles. Use this skill whenever the user wants to audit accessibility, review WCAG compliance, fix accessibility issues, prepare for accessibility certification, address an accessibility lawsuit risk, or systematically improve a site's accessibility. Triggers on accessibility audit, WCAG audit, a11y audit, accessibility compliance, ADA compliance, screen reader test, keyboard navigation, accessibility report, fix accessibility, axe scan. Also triggers when accessibility issues have been reported and need systematic remediation.

ads-creative-developmentSkill

How to produce ad creative that converts at performance scale. Hook patterns, format selection, video pacing, variation systems, sequential testing methodology, fatigue detection, brand-voice alignment without conversion dilution, and platform-specific creative norms. Triggers on ad creative, ad design, hook patterns, ad video pacing, creative testing, ad variations, creative refresh, creative fatigue, refresh ad creative, video ads for Meta, TikTok creative, LinkedIn ad creative, ad asset library. Also triggers when a team is producing creative at scale, planning a creative test cycle, or auditing why creative is not converting.

ads-performance-analyticsSkill

How to read paid media dashboards without fooling yourself. Attribution models, platform reporting quirks, multi-platform reconciliation, ROAS vs LTV horizon traps, statistical noise in performance metrics, incrementality testing, and the failure modes that produce expensive lessons. Triggers on read paid media dashboard, attribution analysis, ROAS vs LTV, multi-platform reconciliation, ad incrementality, geo holdout, conversion lift study, ghost bidding, paid media reporting, board-deck paid media metrics, blended CAC, MMM, MTA, last-click attribution. Also triggers when a marketer is about to scale, kill, or rebudget a campaign based on platform metrics, or when reconciling platform reports against warehouse revenue.

after-action-reportSkill

Run a structured after-action review (postmortem, retrospective) on a launch, incident, or completed project to capture timeline, root cause analysis, contributing factors, and actionable lessons. Use this skill whenever the user wants to run a postmortem, retrospective, AAR, or after-action review on any past event. Triggers on after-action report, AAR, postmortem, retrospective, retro, post-incident review, what went well what didn't, lessons learned, blameless postmortem, root cause analysis, RCA, five whys. Also triggers when the user has just shipped something or just resolved an incident and wants to capture learnings.

ai-content-collaborationSkill

How humans and AI compose in content workflows. Where AI legitimately participates, where humans must own, hybrid workflow patterns, voice ownership preservation, the AI slop problem, disclosure and transparency, team calibration, and the ethics of intellectually honest AI-assisted content production. Triggers on AI content workflow, AI-assisted writing, hybrid content production, AI in editorial, AI slop, AI disclosure, AI usage policy, AI content ethics, voice preservation with AI, team AI calibration. Also triggers when content feels generic despite quality tools, when team AI usage has drifted into inconsistency, or when a regulated or trust-sensitive context requires explicit AI policy.

analytics-strategySkill

Design measurement frameworks including event taxonomy, KPI hierarchy, dashboard architecture, attribution models, and analytics implementation strategy. Use this skill whenever the user wants to plan analytics, design dashboards, build event taxonomies, define KPIs, set up tracking, or audit existing measurement. Triggers on analytics strategy, measurement plan, event taxonomy, tracking plan, KPI framework, dashboard design, north star metric, attribution model, conversion tracking, GA4 setup, Mixpanel setup, analytics audit. Also triggers when the user has data but no clear way to use it, or wants to make decisions but doesn't know what to track.

art-directionSkill

Direct visual and creative work for campaigns, photography, illustration, video, and branded experiences. Use this skill whenever the user wants to brief a photographer, direct illustrators, plan a creative campaign, develop visual concepts, write a creative direction document, or evaluate creative work for fit. Triggers on art direction, photo brief, photography brief, illustration brief, campaign concept, creative concept, visual direction, mood board, look and feel, visual treatment, video direction. Also triggers when the user has approved brand identity but needs to extend it into specific creative deliverables.

beta-program-managementSkill

Running closed and open betas that produce real signal. Beta participant selection, structured feedback collection, beta-to-GA decision criteria, and the difference between soft-launch (no structure, no signal), kitchen-sink (everyone in, no actionable feedback), and structured beta (calibrated cohort, intentional feedback loops, clear graduation criteria). Triggers on beta program, alpha test, beta cohort, beta participant, beta feedback, beta to GA decision, design partner, early access program, closed beta, open beta, RC release. Also triggers when a feature is approaching launch and the team needs structured pre-GA validation, when prior betas produced noise rather than signal, or when the team has soft-launched before but wants more structured feedback this time.