validate-model
The validate-model command audits model entries in apps/sim/providers/models.ts against live provider API documentation to detect hallucinated pricing and capabilities. Use this when adding or updating models to ensure every numeric claim and capability flag is verified against current official sources, not training data, with findings cited to specific documentation URLs and cross-checked against secondary pricing sources like OpenRouter or CloudPrice.
mkdir -p ~/.claude/commands && curl -fsSL https://raw.githubusercontent.com/simstudioai/sim/HEAD/.claude/commands/validate-model.md -o ~/.claude/commands/validate-model.mdvalidate-model.md
# Validate Model Skill
You audit one or more model entries in `apps/sim/providers/models.ts` against the provider's official live API docs. **Hallucinated pricing and capabilities are the #1 failure mode in this file.** Every numeric and capability claim must be re-derived from a live web fetch in this session — not from memory, not from training data, not from the user's marketing email.
## Hard rules (do not skip)
1. **Live-fetch or report unverified.** Each field must be backed by a live WebFetch in this session. If you cannot reach an authoritative URL for a field, mark it **UNVERIFIED** in the report — do not silently confirm it from memory.
2. **Cite every fact.** Every value in the report must show the source URL it was checked against. No URL → mark UNVERIFIED.
3. **Two-source rule for pricing.** Cross-check input/output/cached against at least one secondary source (OpenRouter, Artificial Analysis, CloudPrice). If sources disagree, the provider's own docs win — flag the disagreement.
4. **Inspect provider implementation before flagging capability mismatches.** A capability flag in `models.ts` is dead unless the provider's code under `apps/sim/providers/{provider}/` consumes it (see Consumption Matrix below). Setting a flag the provider ignores is a warning, not a critical.
5. **Never auto-fix without printing the diff.** Show the user the proposed diff before applying. Get confirmation.
## Your Task
When invoked as `/validate-model <provider> [model-id]`:
1. Read the target entries from `models.ts`
2. Live-fetch the provider's official models, pricing, and capability/reasoning pages + at least one secondary source for pricing
3. Inspect the provider implementation to know which flags are actually consumed
4. Run the checklist below per model
5. Report findings (critical / warning / suggestion / unverified) with every cell linked to its source URL
6. Offer to fix; on confirm, edit `models.ts` in a single pass and re-lint
If `model-id` is omitted, validate every model in the provider.
## Step 1: Read entries from `models.ts`
Capture per model: `id`, full `pricing`, full `capabilities`, `contextWindow`, `releaseDate`, `recommended`, `speedOptimized`, `deprecated`.
## Step 2: Live-fetch authoritative sources
Use the canonical provider URL table in the `add-model` skill (`.claude/commands/add-model.md`, or its mirror `.agents/skills/add-model/SKILL.md`), Step 1, as the single source of truth — fetch the models index, pricing, and reasoning/parameter caveats pages listed there for the target provider. If you update one table, update the other in the same change.
Secondary cross-check (use at least one): OpenRouter, Artificial Analysis, CloudPrice.
If a fetch fails (404, timeout, paywall), record the URL attempted and mark dependent fields UNVERIFIED.
## Step 3: Build the consumption map for this provider
Re-grep before trusting the snapshot below:
```bash
rg "reasoningEffort|reasoning_effort" apps/sim/providers/<provider>/
rg "verbosity" apps/sim/providers/<provider>/
rg "request\.thinking|thinking:" apps/sim/providers/<provider>/
rg "supportsNativeStructuredOutputs|nativeStructuredOutputs" apps/sim/providers/<provider>/
```
Snapshot (verify before relying):
| Capability | Consumed by |
|---|---|
| `reasoningEffort` | `openai/core.ts`, `azure-openai`, `anthropic/core.ts` (mapped via thinking), `gemini/core.ts` |
| `verbosity` | `openai/core.ts`, `azure-openai/index.ts` |
| `thinking` | `anthropic/core.ts`, `gemini/core.ts` |
| `nativeStructuredOutputs` | `anthropic/core.ts`, `fireworks/index.ts`, `openrouter/index.ts` |
| `computerUse` | `anthropic/core.ts` |
| `temperature` | All providers (passthrough) |
A flag set in `models.ts` but not in the consumption list for this provider = **warning: dead flag**.
## Step 4: Run the checklist
For each model, evaluate every row. Statuses: ✓ matches docs, ✗ disagrees, ⚠️ single-source, ❓ UNVERIFIED (could not fetch).
### Identity
- [ ] `id` exactly matches provider's API model identifier (case, dots, dashes, prefix for resellers)
- [ ] `releaseDate` matches launch announcement
- [ ] `deprecated: true` set if provider has announced retirement (or removed from active list)
### Pricing (per 1M tokens, USD)
- [ ] `pricing.input` matches provider pricing page
- [ ] `pricing.output` matches provider pricing page
- [ ] `pricing.cachedInput` matches provider's documented cached/prompt-cache rate (or is correctly omitted if no caching offered)
- [ ] `pricing.updatedAt` is recent — warn if older than 60 days
### Context & output limits
- [ ] `contextWindow` matches docs (in tokens)
- [ ] `capabilities.maxOutputTokens` matches documented output cap (or is correctly omitted if "no output limit")
### Capabilities (each must be DOCUMENTED-AS-SUPPORTED **and** CONSUMED-BY-PROVIDER-CODE)
- [ ] `temperature` — provider accepts it for this model (reasoning-always-on models often reject)
- [ ] `reasoningEffort.values` — list matches docs; **omitted** for always-reasoning models that reject the parameter (e.g., grok-4.3, where xAI docs explicitly state `reasoning_effort` is not supported). Verify per model — some always-reasoning models (e.g., OpenAI's o-series) DO accept `reasoning_effort` and should keep the flag.
- [ ] `verbosity.values` — only on OpenAI gpt-5.x family; values match docs
- [ ] `thinking.levels` + `thinking.default` — only on Anthropic/Gemini; values match docs
- [ ] `nativeStructuredOutputs` — only on anthropic/fireworks/openrouter; provider must document Structured Outputs / JSON-mode for this model
- [ ] `toolUsageControl` — provider supports `tool_choice` semantics
- [ ] `computerUse` — provider implements computer-use loop AND model is a computer-use SKU
- [ ] `deepResearch` — only on actual deep-research SKUs
- [ ] `memory: false` — only when the model genuinely cannot maintain conversation history
### Flags
- [ ] `recommended: true` — at most one or two per provider; should be current flagship
- [ ] `speedOptimizCreate a block configuration for a Sim integration with proper subBlocks, conditions, and tool wiring
Add a knowledge base connector for syncing documents from an external source
Add a code-defined table enrichment (registry entry) backed by a provider cascade, ensuring each provider tool has hosted-key support
Add hosted API key support to a tool so Sim provides the key when users don't bring their own. Use when adding hosted keys, BYOK support, hideWhenHosted, or hosted key pricing to a tool or block.
Add a complete integration to Sim (tools, block, icon, registration)
Add a new LLM model to apps/sim/providers/models.ts with specs verified against the provider's live API docs (no hallucination)
Create tool configurations for a Sim integration by reading API docs
Create webhook or polling triggers for a Sim integration