experimentation-platform-orchestrator
The Experimentation Platform Orchestrator provides a structured decision framework for selecting, evaluating, and migrating between experimentation platforms like Statsig, PostHog, GrowthBook, Optimizely, Amplitude, Eppo, and Kameleoon. Use this skill when choosing an experimentation platform for the first time, evaluating whether to switch vendors, consolidating from multi-platform setups, or planning migrations. It prioritizes long-term cost, governance, and compounding effects rather than surface-level feature comparisons.
git clone --depth 1 https://github.com/rampstackco/claude-skills /tmp/experimentation-platform-orchestrator && cp -r /tmp/experimentation-platform-orchestrator/dist/pi/.agents/skills/experimentation-platform-orchestrator ~/.claude/skills/experimentation-platform-orchestratorSKILL.md
# Experimentation Platform Orchestrator A senior product and engineering leader's playbook for making the experimentation platform decision and recovering from making it wrong. Picking an experimentation platform is one of those decisions that looks easy at the start and compounds for years afterward. The wrong choice costs you in lost experiments (because the team avoids the painful workflow), in cost (because the wrong pricing model penalizes your usage shape), in vendor lock-in (because migration is real engineering work, not a config change), and in cultural drift (because the platform's defaults shape what your team thinks experimentation is). This skill is the discipline that makes the decision well the first time and the migration plan when you didn't. When to use this skill: choosing a platform from scratch, evaluating whether to switch, deciding whether to consolidate from multi-platform to single, or planning a migration that has already been approved. --- ## What this skill is for This skill spans platform selection, multi-platform decisions, migration planning, and governance setup. It does not cover experiment design (use `experiment-design`), result interpretation (use `experimentation-analytics`), or feature flag operations (use `feature-flagging`). Pair this skill with the relevant integrations microsite when you need platform-specific MCP details. The audience is a PM, engineering leader, or data lead who is making the decision or recovering from a previous one. The voice is decisive. There is no "it depends, evaluate them all yourself." The decision space has real shape, and a senior advisor can map your situation to a defensible answer in an afternoon. --- ## The 7 considerations for the platform decision Every platform evaluation walks the same seven questions. Answer them honestly first, then read the per-platform profiles, then consult the decision matrix. The order matters: data architecture and statistical rigor are foundational; the rest are layered on top. 1. **Data architecture.** Where does experiment data live? Three patterns. Vendor-native (Statsig, Optimizely) keeps the data in the vendor's storage. Product-suite (PostHog, Amplitude) combines analytics and experiments behind one event pipeline. Warehouse-native (GrowthBook, Eppo) runs SQL on your existing data warehouse. The pattern dictates security review depth, residency, statistical depth, and cost shape. 2. **Statistical rigor.** Does the platform implement CUPED, sequential testing, the delta method for ratio metrics, and multiple testing corrections? Cheap to verify in a sales call: ask "what variance estimator do you use for ratio metrics?" and "do you support always-valid p-values?" Modern platforms (Statsig, Eppo, parts of PostHog) have these. Older or homegrown platforms often do not. 3. **MCP availability.** All seven platforms covered here have a first-party or hosted MCP except Eppo (as of May 2026). MCP availability matters more for agentic workflows where AI agents create and read experiments end to end. It matters less for traditional human-driven experimentation. Worth weighting if your team is AI-forward. 4. **Feature flag integration.** Do experiments and feature flags live in the same platform? Statsig, Optimizely, GrowthBook, and PostHog all unify them. Eppo is experiment-only. Kameleoon is creative-personalization-focused. If you also need feature flag operations as production infrastructure, check `feature-flagging` for the operational discipline; the platform choice has to support both surfaces or you accept a second tool. 5. **Analytics depth.** Can you see funnels, retention, and cohorts in the same surface as experiment results? PostHog and Amplitude are strongest here (analytics-first products). Statsig has a strong analytics overlay. Optimizely and GrowthBook are experiment-first, with analytics as a supplementary feature. 6. **Governance and audit.** Who can change targeting in production, who can ship experiments, who can read sensitive metrics? Enterprise tiers (Optimizely, LaunchDarkly Federal) handle this with maturity. Open-source platforms (GrowthBook, PostHog self-hosted) require self-built governance. For regulated industries (healthcare, finance, public sector), this question is the deciding factor. 7. **Cost shape.** Vendor-native scales with events. Warehouse-native scales with seats and warehouse compute. Product-suite scales with combined event volume across all features. Match the pricing shape to your actual usage shape. A high-traffic startup pays vendor-native pricing differently from a lower-traffic enterprise; pick the shape that is friendly to your trajectory, not just your current month. --- ## Statsig Modern experimentation and feature management combined in one platform. CUPED and sequential testing built in. Strong PM-led ergonomics. Used by OpenAI, Notion, Brex, Figma. **Strengths.** Fast time to first experiment. Combined experiments and feature flags eliminate the second-tool tax. Statistical rigor is current with the literature. The MCP exposes full CRUD across experiments, gates, dynamic configs, and metrics. **Gotchas.** Pricing scales with events, which can become expensive at high scale. The platform has strong opinions about how experiments should run; teams that want a custom statistical workflow will fight the defaults. Self-host is not a first-class option. **Ideal customer.** Fast-growing SaaS that wants one platform for flags and experiments, values out-of-the-box statistical depth, and is comfortable with vendor-native data architecture. --- ## PostHog Open-source product OS combining product analytics, experiments, feature flags, surveys, session replays, error tracking, and LLM analytics. Free tier available. **Strengths.** Full-funnel context. Experiments live next to the analytics that contextualize them. The MCP exposes 200+ tools (use scoping like `?features=` to keep the agent context tight). Self-host
Run a comprehensive WCAG accessibility audit covering perceivable, operable, understandable, and robust principles. Use this skill whenever the user wants to audit accessibility, review WCAG compliance, fix accessibility issues, prepare for accessibility certification, address an accessibility lawsuit risk, or systematically improve a site's accessibility. Triggers on accessibility audit, WCAG audit, a11y audit, accessibility compliance, ADA compliance, screen reader test, keyboard navigation, accessibility report, fix accessibility, axe scan. Also triggers when accessibility issues have been reported and need systematic remediation.
How to produce ad creative that converts at performance scale. Hook patterns, format selection, video pacing, variation systems, sequential testing methodology, fatigue detection, brand-voice alignment without conversion dilution, and platform-specific creative norms. Triggers on ad creative, ad design, hook patterns, ad video pacing, creative testing, ad variations, creative refresh, creative fatigue, refresh ad creative, video ads for Meta, TikTok creative, LinkedIn ad creative, ad asset library. Also triggers when a team is producing creative at scale, planning a creative test cycle, or auditing why creative is not converting.
How to read paid media dashboards without fooling yourself. Attribution models, platform reporting quirks, multi-platform reconciliation, ROAS vs LTV horizon traps, statistical noise in performance metrics, incrementality testing, and the failure modes that produce expensive lessons. Triggers on read paid media dashboard, attribution analysis, ROAS vs LTV, multi-platform reconciliation, ad incrementality, geo holdout, conversion lift study, ghost bidding, paid media reporting, board-deck paid media metrics, blended CAC, MMM, MTA, last-click attribution. Also triggers when a marketer is about to scale, kill, or rebudget a campaign based on platform metrics, or when reconciling platform reports against warehouse revenue.
Run a structured after-action review (postmortem, retrospective) on a launch, incident, or completed project to capture timeline, root cause analysis, contributing factors, and actionable lessons. Use this skill whenever the user wants to run a postmortem, retrospective, AAR, or after-action review on any past event. Triggers on after-action report, AAR, postmortem, retrospective, retro, post-incident review, what went well what didn't, lessons learned, blameless postmortem, root cause analysis, RCA, five whys. Also triggers when the user has just shipped something or just resolved an incident and wants to capture learnings.
How humans and AI compose in content workflows. Where AI legitimately participates, where humans must own, hybrid workflow patterns, voice ownership preservation, the AI slop problem, disclosure and transparency, team calibration, and the ethics of intellectually honest AI-assisted content production. Triggers on AI content workflow, AI-assisted writing, hybrid content production, AI in editorial, AI slop, AI disclosure, AI usage policy, AI content ethics, voice preservation with AI, team AI calibration. Also triggers when content feels generic despite quality tools, when team AI usage has drifted into inconsistency, or when a regulated or trust-sensitive context requires explicit AI policy.
Design measurement frameworks including event taxonomy, KPI hierarchy, dashboard architecture, attribution models, and analytics implementation strategy. Use this skill whenever the user wants to plan analytics, design dashboards, build event taxonomies, define KPIs, set up tracking, or audit existing measurement. Triggers on analytics strategy, measurement plan, event taxonomy, tracking plan, KPI framework, dashboard design, north star metric, attribution model, conversion tracking, GA4 setup, Mixpanel setup, analytics audit. Also triggers when the user has data but no clear way to use it, or wants to make decisions but doesn't know what to track.
Direct visual and creative work for campaigns, photography, illustration, video, and branded experiences. Use this skill whenever the user wants to brief a photographer, direct illustrators, plan a creative campaign, develop visual concepts, write a creative direction document, or evaluate creative work for fit. Triggers on art direction, photo brief, photography brief, illustration brief, campaign concept, creative concept, visual direction, mood board, look and feel, visual treatment, video direction. Also triggers when the user has approved brand identity but needs to extend it into specific creative deliverables.
Plan and run backups, set recovery objectives, and run disaster recovery drills. Use this skill when defining RPO/RTO targets, designing backup architecture, deciding what to back up and how often, planning for full-region or platform outages, or running a restoration drill. Triggers on backup, restore, RPO, RTO, disaster recovery, DR, business continuity, what if the database is gone, what if our hosting goes down, recovery drill, ransomware planning. Also triggers when an incident reveals a gap in restoration capability.