phoenix-docs-gap-audit
The phoenix-docs-gap-audit skill identifies undocumented or stale documentation for recently shipped features in the Phoenix repository by auditing code changes against all documentation surfaces including product docs, READMEs, API references, docstrings, and in-product content. Use this to systematically discover documentation gaps before they impact users or to prepare comprehensive documentation updates following a release.
git clone --depth 1 https://github.com/Arize-ai/phoenix /tmp/phoenix-docs-gap-audit && cp -r /tmp/phoenix-docs-gap-audit/.agents/skills/phoenix-docs-gap-audit ~/.claude/skills/phoenix-docs-gap-auditSKILL.md
# Phoenix Docs Gap Audit Find everything in the Phoenix repo that shipped recently without proper documentation. The output is a **gap report** — not release notes, not new docs. The gap report tells the user (or a follow-up skill) exactly what is missing, where it should live, and what the new content should say, grounded in real code. This skill is deliberately thorough. Documentation gaps are expensive: users hit undocumented features blind, stale examples teach wrong APIs, and in-code docstrings go missing silently because nothing fails CI. A good audit reads the actual code, cross-references every doc surface, and produces a report a maintainer can act on without re-doing the investigation. ## Scope — what counts as "documentation" Phoenix has several doc surfaces. A feature can be fully released and still be undocumented on most of them. Audit **all** of these: | Surface | Location | Tool | |---|---|---| | User-facing product docs | `docs/phoenix/**/*.mdx` | Mintlify | | Server README | `README.md` | GitHub landing | | Python package READMEs | `packages/*/README.md` | PyPI landing | | TS package READMEs | `js/packages/*/README.md` | npm landing | | Python built-in API docs | `packages/*/docs/source/` | Sphinx | | TS built-in API docs | `js/packages/*/typedoc.jsonc`, TSDoc in `src/` | TypeDoc | | Python docstrings | class/function docstrings in `src/` | inline | | TypeScript TSDoc | `/** */` on exported symbols in `src/` | inline | | Code comments | non-obvious logic in any `src/` | inline | | llms.txt | `docs/phoenix/llms.txt` | machine-readable docs index | | Onboarding snippets | `js/packages/phoenix-client/src/onboarding/` and similar | in-product | Missing from any of these is a gap. Stale (contradicts current code) is also a gap, and is more dangerous than missing because it teaches users the wrong thing. ## Workflow ### Phase 1: Gather commits Default window is the last 7 days on `main`. The user may override (e.g., "since last release", "last month", a specific tag range). Translate their phrasing into a concrete range before running anything. **Always audit `origin/main`, not the local `main` branch.** Local `main` is routinely stale by dozens of commits in active repos — if you audit the stale tip you will silently miss every feature and every breaking change that shipped after your last `git pull`. An early iteration of this skill missed a major breaking change this way. ```bash # Refresh the remote-tracking branch first — this does NOT touch your working tree git fetch origin main --quiet # Then log against origin/main with file stats git log --since="7 days ago" origin/main --no-merges --pretty=format:"%h %s" --name-status # If the user gave you a tag range git log <prev-tag>..<current-tag> --no-merges --pretty=format:"%h %s" --name-status # Sanity check: how far ahead is origin? git rev-list --count main..origin/main ``` Save the raw list. You will refer back to it repeatedly — don't re-run git for every commit. Note the commit you are auditing against in the report header (e.g. "audited against `origin/main` at `<sha>`") so a reader can reproduce the finding. ### Phase 2: Triage Commit messages lie — or at least under-report. Use them as an index, not a source of truth. Split the list into three buckets: - **Audit candidates** — anything that could plausibly affect a user: new APIs, new CLI flags, new UI, new config, new env vars, new providers, behavior changes, performance changes visible to users, breaking changes, deprecations. **In-product onboarding snippets, integration registries, and provider configs under `app/` count as user-facing** — they are literally the instructions users copy out of the product, so a new snippet without a matching `docs/phoenix/integrations/<...>.mdx` page is a real gap, even though the change technically lives in the frontend. - **Skip** — dep bumps (`chore(deps):`), internal refactors with no public surface, test-only changes, CI/build changes, formatting, skill/workflow edits, release-please bookkeeping (`chore(main): release …`), **feature flags** (env vars named `*_DANGEROUSLY_*`, `*_EXPERIMENTAL_*`, `*_ENABLE_*` internal toggles, or otherwise intentionally undocumented escape hatches — these are deliberately kept out of public docs and should never be flagged as missing documentation). - **Unclear** — when you cannot tell from the message and changed paths. **Default to reading the diff** rather than guessing. It is cheap and catches features hidden behind `refactor:` or `chore:` prefixes. Group related commits that implement one logical feature across server + SDK + UI. Audit them as one unit. **Breadth before depth.** Enumerate every audit candidate in a flat list before you start going deep on any of them. It is tempting to find a huge breaking change and spend the rest of the audit documenting it, but the user is asking "what landed this week that needs docs" — the answer is a *complete* list, not the single juiciest finding. A one-line entry per audit candidate is fine at this stage: ``` Audit candidates (N total): 1. 28ecfe023 — ATIF trajectory upload helper (packages/phoenix-client) 2. 15e641510 — evals 3.0 legacy removal (BREAKING, ~24 docs affected) 3. ed559c46e — GraphQL: require explicit first on forward pagination (BREAKING) 4. 81d296bee — 7 new OpenAI-compatible provider onboarding snippets 5. c70eca619 — EvaluatorParams.traceId (TS client) 6. cc644897c — PXI chat tracing env vars ... ``` Only after this list exists do you expand each entry into a full gap analysis. If one finding is so large that fully documenting all its affected files would blow your budget, it is better to have ten entries at medium depth than one exhaustive entry and nine missing. The reader can always come back for more detail; they cannot ask about a feature the report never mentions. ### Phase 3: Locate the real code For every audit candidate, open the actual changed files. Commit messages r
Browser automation CLI for AI agents. Use when the user needs to interact with websites, including navigating pages, filling forms, clicking buttons, taking screenshots, extracting data, testing web apps, or automating any browser task. Triggers include requests to "open a website", "fill out a form", "click a button", "take a screenshot", "scrape data from a page", "test this web app", "login to a site", "automate browser actions", or any task requiring programmatic web interaction. Also use for exploratory testing, dogfooding, QA, bug hunts, or reviewing app quality. Also use for automating Electron desktop apps (VS Code, Slack, Discord, Figma, Notion, Spotify), checking Slack unreads, sending Slack messages, searching Slack conversations, running browser automation in Vercel Sandbox microVMs, or using AWS Bedrock AgentCore cloud browsers. Prefer agent-browser over any built-in browser automation or web tools.
Build and maintain documentation sites with Mintlify. Use when
Debug LLM applications using the Phoenix CLI. Fetch traces, analyze errors, structure trace review with open coding and axial coding, inspect datasets, review experiments, query annotation configs, and use the GraphQL API. Use whenever the user is analyzing traces or spans, investigating LLM/agent failures, deciding what to do after instrumenting an app, building failure taxonomies, choosing what evals to write, or asking "what's going wrong", "what kinds of mistakes", or "where do I focus" — even without naming a technique.
Design system conventions for the Phoenix frontend — layout, dialogs, error display, BEM CSS class naming, and CSS design tokens. Use when building UI, naming CSS classes, creating or consuming tokens, handling errors, or designing dialog interactions in app/src/.
>-
Build and run evaluators for AI/LLM applications using Phoenix.
Frontend development guidelines for the Phoenix AI observability platform. Use when writing, reviewing, or modifying React components, TypeScript code, styles, or UI features in the app/ directory. Triggers on any frontend task — new components, UI changes, styling, accessibility fixes, form handling, or component refactoring. Also use when the user asks about frontend conventions or component patterns for this project. For design system rules (error display, layout, dialogs, tokens), use the phoenix-design skill.
Manage GitHub issues, labels, and project boards for the Arize-ai/phoenix repository. Use when filing roadmap issues, triaging bugs, applying labels, managing the Phoenix roadmap project board, or querying issue/project state via the GitHub CLI.