feature-flag-guide
This Claude Code skill generates comprehensive feature flag management documentation for engineering teams. It produces a structured guide covering flag taxonomy, creation procedures, rollout strategies, monitoring requirements, and cleanup policies tailored to a specific service, platform, and governance needs. Use it when establishing flag discipline across a team, planning a safe feature rollout, documenting flag lifecycle policies, or creating templates for consistent flag creation and decommissioning practices.
git clone --depth 1 https://github.com/mohitagw15856/pm-claude-skills /tmp/feature-flag-guide && cp -r /tmp/feature-flag-guide/plugins/pm-engineering/skills/feature-flag-guide ~/.claude/skills/feature-flag-guideSKILL.md
# Feature Flag Guide Skill
Produce a complete feature flag management guide for a service or team — covering how flags are named and categorised, how to create and roll out a flag safely, what to monitor during rollout, when and how to clean up flags, and who is responsible for each stage. Feature flags without discipline become permanent technical debt. This guide gives the team a repeatable process so flags are created intentionally, rolled out safely, and removed when done.
## Required Inputs
Ask for these if not already provided:
- **Service or team name** — scope of the guide
- **Feature flag platform** — LaunchDarkly, Split, Unleash, Flagsmith, Flipt, or a custom/in-house solution
- **Flag being documented** (if writing a per-flag guide) or "general guide" (if writing team-wide policy)
- **Rollout constraints** — any compliance, data privacy, or contractual constraints on who can see a feature (e.g. HIPAA, EU-only, enterprise customers only)
## Output Format
---
# Feature Flag Management Guide: [Service / Team Name]
**Team:** [Team name] | **Platform:** [LaunchDarkly / Split / Unleash / Custom]
**Document owner:** [Name] | **Last updated:** [Date]
**Review cycle:** Quarterly, and whenever the flag platform changes
---
## 1. Flag Taxonomy
Every flag belongs to exactly one category. The category determines default behaviour, who can enable it in production, and when it must be cleaned up.
| Type | Purpose | Default state | Production gate | Max lifetime |
|---|---|---|---|---|
| **Release flag** | Controls rollout of a new feature — decouples deploy from release | Off | Tech lead approval | 90 days from feature launch |
| **Experiment flag** | A/B or multivariate test — measures impact of a change | Off (control group) | Product + tech lead | Duration of experiment + 30 days |
| **Ops flag** | Operational control — circuit breaker, kill switch, throttle | On (normal behaviour) | On-call engineer can toggle | Indefinite (review annually) |
| **Permission flag** | Gates access by user segment, tier, or region | Off (restricted) | Product + Account owner | Indefinite (review annually) |
**When in doubt:** If the flag is temporary (tied to a specific feature launch), it is a Release flag. If it will exist forever as a control knob, it is an Ops flag.
---
## 2. Flag Naming Convention
All flags must follow this naming scheme:
```
[type]-[service]-[feature-description]
```
| Segment | Values | Example |
|---|---|---|
| type | `release`, `exp`, `ops`, `perm` | `release` |
| service | Short service identifier, lowercase, hyphenated | `payments` |
| feature-description | Kebab-case description, max 5 words | `new-checkout-flow` |
**Full examples:**
- `release-payments-new-checkout-flow` — release flag for a new checkout feature in the payments service
- `exp-search-personalized-ranking` — experiment on personalized search ranking
- `ops-api-rate-limit-override` — operational flag to override API rate limits
- `perm-dashboard-beta-users-only` — permission flag gating dashboard for beta users
**Do not:**
- Use ticket numbers in flag names (`release-JIRA-1234` → not searchable or self-describing)
- Use dates in flag names (`release-dark-mode-jan-2024` → flags outlive their dates)
- Use vague names (`release-new-thing` → not useful when you have 50 flags)
---
## 3. Flag Creation Checklist
Complete every item before creating a flag in the production environment.
**Before creating the flag:**
- [ ] Flag type determined from taxonomy (Section 1)
- [ ] Flag name follows naming convention (Section 2)
- [ ] Flag owner assigned — one named engineer responsible for cleanup
- [ ] Cleanup date set in the flag description field (for Release and Experiment flags)
- [ ] Rollout strategy defined — see Section 4
- [ ] Monitoring plan defined — see Section 5
- [ ] Code review approved with flag guard in place
**Flag description field (required):**
```
Type: [Release / Experiment / Ops / Permission]
Owner: [Name]
Linked ticket: [JIRA-XXXX or GitHub issue URL]
Purpose: [One sentence — what this flag controls]
Cleanup by: [Date — required for Release and Experiment flags; "Annual review" for Ops/Permission]
Rollout plan: [Link to this document or inline summary]
```
**Code requirements:**
```python
# Good — behaviour is clear when flag is off, and cleanup is obvious
if flag_client.is_enabled("release-[service]-[feature]", user_context):
return new_feature_handler(request)
else:
return existing_handler(request)
# Bad — nested flags, ternaries, and implicit defaults make cleanup error-prone
result = new_handler() if (f1 and not f2) or f3 else old_handler()
```
---
## 4. Rollout Strategy
### Decision Tree
Use this decision tree to pick the right rollout strategy for a Release or Experiment flag:
```
Is the change reversible without a deploy?
├── No → Use an Ops flag with manual enable, not a percentage rollout
└── Yes → Continue
Is there a user-level identifier available (user ID, session ID)?
├── No → Use server-side percentage (stateless, but inconsistent per user)
└── Yes → Use user-based percentage (consistent experience per user) ← preferred
Is the change risky (touches payments, auth, or data writes)?
├── Yes → Start at 1% → 5% → 25% → 50% → 100%, with 24-hour holds
└── No → Start at 10% → 50% → 100%, with 4-hour holds
Does the change affect specific customer tiers or geographies?
├── Yes → Use segment-based targeting, not percentage rollout
└── No → Use percentage rollout
```
### Rollout Stages
| Stage | Percentage | Hold duration | Pass criteria before advancing |
|---|---|---|---|
| Canary | 1% | 24 hours | Error rate within SLO, no P1 incidents |
| Early rollout | 5–10% | 24 hours | Error rate and latency match control group |
| Partial rollout | 25–50% | 24–48 hours | Business metrics not degraded vs. control |
| Majority | 75% | 24 hours | Final check — no regressions |
| Full rollout | 100% | 48 hours | Stable — schedule cleanup |
**Do not skip stages for ReleaConduct a structured ethical review of an AI or ML feature, model, or product. Use when preparing to deploy an AI system, assessing algorithmic risk, auditing a model for bias, or producing a responsible AI impact assessment. Produces a structured ethics review covering fairness, transparency, privacy, safety, accountability, and societal impact with a risk tier score, pre-deployment checklist, and prioritised mitigations.
Structure AI and ML product decisions with the rigour of any product decision. Use when building AI-powered features, evaluating LLM integrations, designing AI products, or assessing AI readiness. Produces a complete AI product canvas covering problem definition, model approach, data requirements, evaluation framework, UX design, responsible AI checklist, and launch monitoring plan.
Transform feature briefs into structured design briefs that give designers the context they need before opening Figma. Use when asked to write a design brief, create a design handoff, brief a designer on a new feature, or translate a PRD into design requirements. Produces a brief with user goal, emotional context, success criteria, constraints, edge cases, and out-of-scope boundaries.
Design statistically rigorous A/B tests and interpret experiment results. Use when asked to design an experiment, run an A/B test, calculate sample size, interpret test results, or assess whether an experiment was successful. Produces a complete experiment design with hypothesis, sample size, run time, success criteria, and risk flags — or a results interpretation with ship/iterate/kill recommendation.
Synthesises user signals from multiple research sources into a unified, weighted insight brief. Use when you have data from interviews, support tickets, NPS verbatims, app reviews, or sales calls and need to reconcile contradictions, surface the underlying need behind requests, or answer 'what are users really telling us'. Produces ranked insights with confidence ratings, source weighting rationale, divergent signal analysis by user segment, and a research gap identification section.
Structure a product data analysis, metric deep-dive, funnel analysis, or cohort study. Use when asked to analyse product metrics, investigate a drop in conversion, explain a data change to stakeholders, or find the root cause of a metric movement. Produces a structured analysis with question, root cause, confidence level, and recommended action.
Interpret product metrics against goals and surface actionable signals. Use when asked to analyse product health, review key metrics, investigate a performance issue, produce a health report, or assess product-market fit signals. Produces a structured health report with RAG status, trend analysis, root cause hypotheses, and prioritised actions.
Structure a retention analysis, churn investigation, or engagement deep-dive for any product team. Use when asked to analyse user retention, investigate churn, measure DAU/MAU, or build a retention improvement plan. Produces a retention snapshot with root cause hypotheses, aha-moment correlation, and prioritised interventions.