ai-product-canvas
The ai-product-canvas skill structures decisions for AI and ML-powered features using a comprehensive canvas covering problem definition, model approach, data requirements, evaluation metrics, UX design, responsible AI considerations, and launch monitoring. Use this when building new AI features, integrating language models into products, designing AI-first products, or assessing organizational AI readiness to ensure technical solutions address genuine user problems and include safeguards for bias, accuracy, and responsible deployment.
git clone --depth 1 https://github.com/mohitagw15856/pm-claude-skills /tmp/ai-product-canvas && cp -r /tmp/ai-product-canvas/plugins/pm-advanced/skills/ai-product-canvas ~/.claude/skills/ai-product-canvasSKILL.md
# AI Product Canvas Skill Define AI products with the same rigour as any product decision — but with additional layers for data, model, evaluation, and responsible AI. This canvas prevents the most common AI product failure: building a technically impressive feature that doesn't solve a real problem. ## AI Product Anti-Patterns to Check First Before building, flag if any of these apply: - ❌ "We should add AI to [existing feature]" — with no user problem defined - ❌ Accuracy target undefined before build begins - ❌ No plan for what happens when the model is wrong - ❌ User-facing AI output with no human review or fallback - ❌ Training data not audited for bias or quality - ❌ No evaluation metric — "we'll know it when we see it" --- ## AI Product Canvas Output Format ### AI Product Canvas — [Feature Name] — [Date] **PM Owner:** [Name] **ML/AI Lead:** [Name] **Status:** Discovery / Design / Build / Evaluation / Live --- #### 1. Problem Definition **User problem being solved:** > [What specific situation is the user in? What job are they trying to get done?] **Why AI?** > [What makes this problem require AI vs a deterministic solution? If the answer is "because we can," stop here.] **Success for the user looks like:** > [What outcome does the user experience when the AI feature is working well?] --- #### 2. AI Approach **Task type:** - [ ] Classification - [ ] Generation (text, image, code) - [ ] Summarisation / extraction - [ ] Recommendation - [ ] Search / retrieval - [ ] Prediction / forecasting - [ ] Conversation / agent **Model approach:** - [ ] LLM API (GPT-4, Claude, Gemini, etc.) — specify: [Model name + version] - [ ] Fine-tuned model on own data - [ ] Custom model trained from scratch - [ ] RAG (retrieval-augmented generation) - [ ] Embedding + vector search **Rationale for chosen approach:** [Why this, not alternatives] --- #### 3. Data Requirements | Data Type | Source | Volume | Quality Status | Bias Risk | |---|---|---|---|---| | [Training data] | [Where it comes from] | [Volume] | [Audit status] | H/M/L | | [Evaluation data] | [Where it comes from] | [Volume] | [Audit status] | H/M/L | **Data gaps:** [What's missing and plan to get it] **Privacy considerations:** [Any PII in training or inference data] **Data ownership:** [Do we own this data? Can we use it for training?] --- #### 4. Evaluation Framework **Primary metric:** [The number that defines success — accuracy, F1, BLEU, user rating, task completion rate] **Minimum acceptable threshold:** [Below X, the feature does not ship] **Human evaluation plan:** [How will humans review model outputs? Sampling rate? Review panel?] | Evaluation Type | Method | Cadence | Owner | |---|---|---|---| | Offline (pre-launch) | [Test set, benchmark] | Pre-launch | ML Lead | | Online (post-launch) | [A/B test, user feedback] | Weekly | PM + ML | | Adversarial | [Red-team, edge cases] | Pre-launch | Safety reviewer | --- #### 5. User Experience Design **How is AI output presented?** - [ ] Direct output shown to user (high trust required) - [ ] AI-assisted with user confirmation - [ ] Suggestion user can accept/reject - [ ] Background action with audit log **Confidence and uncertainty handling:** - What happens when confidence is low? [Show alternative, ask for clarification, fallback to manual] - How is uncertainty communicated to the user? [UI pattern] **Fallback plan:** - If the model fails or returns an error: [Specific fallback behaviour] - If accuracy degrades below threshold: [Kill switch or graceful degradation plan] --- #### 6. Responsible AI Checklist - [ ] Bias audit completed on training data - [ ] Demographic fairness evaluated (does performance differ by user group?) - [ ] Hallucination / confabulation risk assessed and mitigated - [ ] User can see and correct AI output - [ ] Opt-out mechanism exists (can user disable the AI feature?) - [ ] Output provenance visible when relevant (does user know AI generated this?) - [ ] PII not used in ways user didn't consent to - [ ] Regulatory review completed (GDPR, AI Act, sector-specific) - [ ] Model cards / documentation completed --- #### 7. Launch & Monitoring Plan **Rollout:** [% of users, with staged expansion criteria] **Monitoring metrics:** - Model performance: [Metric + alert threshold] - User engagement with AI output: [Acceptance rate, override rate, feedback score] - Error rate: [% of failed inferences] - Latency: [P95 target] **Model refresh cadence:** [How often is the model retrained or updated?] **Drift detection:** [How will you know when model performance degrades in production?] --- ## Guidelines - Never skip the "Why AI?" section — it's the most important question in AI product development - The fallback UX is not optional — what happens when AI fails defines your product's trustworthiness - Responsible AI checklist must be completed before launch, not after - Include latency in success metrics — a 5-second AI response is often worse than no AI at all - Recommend starting with a human-in-the-loop design and automating only when accuracy is proven ## Required Inputs Ask the user for these if not provided: - **Feature or product description** (what the AI is intended to do) - **User problem** (what problem the AI is solving for users) - **Available data** (what training/inference data exists) - **ML/AI lead** (who owns the technical implementation) ## Anti-Patterns - [ ] Do not skip the "Why AI?" question — if the answer is "we want to use AI," stop and reframe around the user problem first - [ ] Do not launch with an undefined accuracy threshold — "good enough" is not a threshold; set a number before build begins - [ ] Do not design the UX to hide AI-generated output as if it were system truth — users need to know when AI is involved so they can override it - [ ] Do not defer the Responsible AI checklist to post-launch — bias and privacy issues are far harder to fix in production than in design - [ ] Do not treat model latency a
Conduct a structured ethical review of an AI or ML feature, model, or product. Use when preparing to deploy an AI system, assessing algorithmic risk, auditing a model for bias, or producing a responsible AI impact assessment. Produces a structured ethics review covering fairness, transparency, privacy, safety, accountability, and societal impact with a risk tier score, pre-deployment checklist, and prioritised mitigations.
Transform feature briefs into structured design briefs that give designers the context they need before opening Figma. Use when asked to write a design brief, create a design handoff, brief a designer on a new feature, or translate a PRD into design requirements. Produces a brief with user goal, emotional context, success criteria, constraints, edge cases, and out-of-scope boundaries.
Design statistically rigorous A/B tests and interpret experiment results. Use when asked to design an experiment, run an A/B test, calculate sample size, interpret test results, or assess whether an experiment was successful. Produces a complete experiment design with hypothesis, sample size, run time, success criteria, and risk flags — or a results interpretation with ship/iterate/kill recommendation.
Synthesises user signals from multiple research sources into a unified, weighted insight brief. Use when you have data from interviews, support tickets, NPS verbatims, app reviews, or sales calls and need to reconcile contradictions, surface the underlying need behind requests, or answer 'what are users really telling us'. Produces ranked insights with confidence ratings, source weighting rationale, divergent signal analysis by user segment, and a research gap identification section.
Structure a product data analysis, metric deep-dive, funnel analysis, or cohort study. Use when asked to analyse product metrics, investigate a drop in conversion, explain a data change to stakeholders, or find the root cause of a metric movement. Produces a structured analysis with question, root cause, confidence level, and recommended action.
Interpret product metrics against goals and surface actionable signals. Use when asked to analyse product health, review key metrics, investigate a performance issue, produce a health report, or assess product-market fit signals. Produces a structured health report with RAG status, trend analysis, root cause hypotheses, and prioritised actions.
Structure a retention analysis, churn investigation, or engagement deep-dive for any product team. Use when asked to analyse user retention, investigate churn, measure DAU/MAU, or build a retention improvement plan. Produces a retention snapshot with root cause hypotheses, aha-moment correlation, and prioritised interventions.
Build the storyline and slide structure for a board presentation. Use when asked to create a board deck, board presentation narrative, board meeting slides, or quarterly board update. Produces a complete slide-by-slide structure with narrative beats, talking points, and slide content guidance.