Skip to main content
ClaudeWave
Skill92 repo starsupdated 1mo ago

venice-models

Discover Venice models, their capabilities, constraints, and pricing. Covers GET /models (with ?type filter), /models/traits, /models/compatibility_mapping, the ModelResponse schema (capabilities, constraints, pricing per type), and how to use this to pick the right model programmatically.

Install in Claude Code
Copy
git clone --depth 1 https://github.com/veniceai/skills /tmp/venice-models && cp -r /tmp/venice-models/skills/venice-models ~/.claude/skills/venice-models
Then start a new Claude Code session; the skill loads automatically.

SKILL.md

# Venice Models

Three read-only endpoints for model discovery — all `GET`:

| Endpoint | Returns |
|---|---|
| `/models` | Full model catalog with `model_spec` (capabilities, constraints, pricing). |
| `/models/traits` | Trait → model ID mapping (e.g. `"default"`, `"fastest"`, `"default_reasoning"`, `"highest_quality"`). |
| `/models/compatibility_mapping` | Legacy / OpenAI / third-party model ID → Venice model ID aliases. |

All three take an optional `?type=` filter: `text`, `image`, `video`, `music`, `tts`, `asr`, `embedding`, `upscale`, `inpaint`, `all`, `code`.

All three are authenticated (Bearer API key or x402 SIWE) like every other `/api/v1` route.

## Use when

- You need to pick a model at runtime based on capabilities (vision, reasoning, function calling, E2EE, X search, multi-image, …).
- You need to validate a request against a model's `constraints` (prompt length, aspect ratio, resolution, steps).
- You need the current **price per million tokens / per image / per second / per 1k chars** to build a cost estimate.
- You want to resolve a user-friendly trait name (e.g. `default`, `default_reasoning`, `highest_quality`) or a frontier-style ID (`openai-gpt-54-pro`, `claude-opus-4-7`) to a concrete Venice model ID.

## `GET /models`

```bash
curl "https://api.venice.ai/api/v1/models?type=text"
```

```json
{
  "object": "list",
  "type": "text",
  "data": [
    {
      "id": "zai-org-glm-5-1",
      "created": 1699000000,
      "model_spec": {
        "name": "GLM 5.1",
        "description": "Balanced blend of speed and capability...",
        "availableContextTokens": 200000,
        "maxCompletionTokens": 24000,
        "privacy": "private",
        "beta": false,
        "betaModel": false,
        "modelSource": "https://huggingface.co/zai-org/GLM-5.1",
        "offline": false,
        "capabilities": { ... },
        "constraints": { ... },
        "pricing": { ... },
        "regionRestrictions": ["US"],
        "deprecation": {"date": "2025-03-01T00:00:00.000Z"}
      }
    }
  ]
}
```

### `model_spec.capabilities` — text models

| Flag | Meaning |
|---|---|
| `optimizedForCode` | Tuned for coding tasks. |
| `quantization` | `fp4` / `fp8` / `fp16` / `bf16` / `int8` / `int4` / `not-available`. |
| `supportsFunctionCalling` | Tools are allowed. |
| `supportsReasoning` | Emits `<thinking>...</thinking>` blocks (and/or provider-specific `reasoning_content`). |
| `supportsReasoningEffort` | Honors `reasoning.effort` / `reasoning_effort`. |
| `supportsResponseSchema` | Honors `response_format: json_schema`. |
| `supportsMultipleImages` + `maxImages` | Multi-image vision support. |
| `supportsVision` | Accepts `image_url` parts. |
| `supportsVideoInput` | Accepts `video_url` parts. |
| `supportsWebSearch` | Honors `venice_parameters.enable_web_search`. |
| `supportsLogProbs` | Honors `logprobs` / `top_logprobs`. |
| `supportsTeeAttestation` | Runs inside a TEE with hardware attestation. |
| `supportsE2EE` | End-to-end encrypted inference available (requires TEE). |
| `supportsXSearch` | xAI native web + X/Twitter search. |
| `supportsAudioInput` | Accepts audio-content message parts (set by the runtime capability builder — not part of the OpenAPI strict schema but appears on `/models` responses). |

### `model_spec.constraints` — by model family

- **Text** — `temperature.default`, `top_p.default`, and `{frequency,presence,repetition}_penalty.default`.
- **Image** — `promptCharacterLimit`, `widthHeightDivisor`, `steps.{default,max}`, optional `aspectRatios[]` + `defaultAspectRatio`, optional `resolutions[]` + `defaultResolution` (the last two appear only on models that use ratio/resolution-based sizing).
- **Video** — `aspect_ratios[]`, `resolutions[]`, `durations[]`, `model_type` (`text-to-video`/`image-to-video`/`video`), `audio`, `audio_configurable`, `prompt_character_limit`.
- **Inpaint / edit** — `aspectRatios[]`, `promptCharacterLimit`, `combineImages`.
- **TTS / Music** (fields surface at the top level of `model_spec`, not inside `constraints`) — `voices[]`, `default_voice`, `supports_lyrics`, `lyrics_required`, `supports_lyrics_optimizer`, `supports_force_instrumental`, `supports_speed`, `supports_language_code`, `min_speed`, `max_speed`, `min_prompt_length`, `prompt_character_limit`. Internal TTS per-model toggles like `supportsPromptParam` / `supportsTemperatureParam` / `supportsTopPParam` exist on the model definitions but are **not** merged into `/models` output today — treat the speech request schema as the support matrix.
- **Embedding** (top-level, not inside `constraints`) — `embeddingDimensions`, `maxInputTokens`, `supportsCustomDimensions`.

### `model_spec.pricing` — by model family

- **LLM** — `input.{usd,diem}`, `output.{usd,diem}` per 1 000 000 tokens, plus optional `cache_input` (reads), `cache_write` (writes, e.g. Anthropic 1.25×), and `extended.*` tier triggered by `context_token_threshold`.
- **Image** — either `generation.{usd,diem}` per image (flat) or `resolutions.<tier>.{usd,diem}` (per `1K`/`2K`/`4K`). Every image row also carries a global `upscale.{2x,4x}.{usd,diem}` block (derived from shared upscale SKUs) — treat it as account-wide upscale pricing, not a signal that this specific model can upscale. Combine with the inpaint/upscale model's own capability check to decide what's actually callable.
- **Inpaint / edit** — `inpaint.{usd,diem}` per edit.
- **Video** — **not currently returned on `/models`.** `calculatePricing()` has no video branch, so video entries have no `model_spec.pricing`. Use `POST /video/quote` for the authoritative per-request price.
- **Music / long audio** — `generation.{usd,diem}` (per job), `per_second.{usd,diem}` (per second generated), `per_thousand_characters.{usd,diem}` (character-priced narration), or `durations.<tier>.{usd,diem,min_seconds,max_seconds}` (duration-bucketed).
- **TTS** — `input.{usd,diem}` per **1 000 000 input characters**.
- **ASR** — `per_audio_second.{usd,diem}`.
- **Embeddings** — `input.{
venice-api-keysSkill

Manage Venice API keys. Covers GET/POST/PATCH/DELETE /api_keys, GET /api_keys/{id}, GET /api_keys/rate_limits, GET /api_keys/rate_limits/log, the two-step /api_keys/generate_web3_key wallet flow, INFERENCE vs ADMIN key types, and per-key consumption limits (USD / DIEM).

venice-api-overviewSkill

High-level map of the Venice.ai API - base URL, authentication modes, endpoint categories, response headers, pricing model, error shape, and versioning. Load this first when starting any Venice integration.

venice-audio-musicSkill

Async music / audio-track generation via Venice. Covers the /audio/quote + /audio/queue + /audio/retrieve + /audio/complete lifecycle, lyrics vs instrumental, voice selection, duration, language, speed, model capability probing, and webhook-free polling.

venice-audio-speechSkill

Generate speech from text via POST /audio/speech. Covers TTS models (Kokoro, Qwen 3, xAI, Inworld, Chatterbox, Orpheus, ElevenLabs Turbo, MiniMax, Gemini Flash), voices per family, output formats (mp3/opus/aac/flac/wav/pcm), streaming, prompt/emotion styling, temperature/top_p, and language hints.

venice-audio-transcriptionSkill

Transcribe audio files to text via POST /audio/transcriptions. Covers supported models (Parakeet, Whisper, Wizper, Scribe, xAI STT), supported formats (wav/flac/m4a/aac/mp4/mp3/ogg/webm), response formats (json/text), timestamps, and language hints. OpenAI-compatible multipart.

venice-augmentSkill

Venice augmentation endpoints for agent pipelines. Covers POST /augment/text-parser (extract text from PDF/DOCX/XLSX/plain text, multipart, up to 25MB, JSON or plain text response), POST /augment/scrape (fetch a URL and return markdown; blocks X/Reddit), and POST /augment/search (Brave ZDR or anonymized Google; structured title/url/content/date results, up to 20 per query). Privacy (zero data retention), rate limits, and error shapes.

venice-authSkill

Authenticate to the Venice API with a Bearer API key or with an x402 / SIWE wallet. Covers header formats, the SIWE message fields, TTL and nonce rules, the venice-x402-client SDK, and how to choose between the two modes.

venice-billingSkill

Venice billing and usage analytics - GET /billing/balance, GET /billing/usage (paginated per-request ledger, JSON or CSV), and GET /billing/usage-analytics (aggregated by date/model/key). Covers the DIEM/USD/BUNDLED_CREDITS consumption priority and building dashboards. (Beta)