Skip to main content
ClaudeWave
Skill92 estrellas del repoactualizado 1mo ago

venice-chat

Call POST /chat/completions on Venice. Covers the OpenAI-compatible request shape, Venice-only venice_parameters (web search, E2EE, characters, thinking control, X search), multimodal inputs (images/audio/video), tool calls, reasoning controls, streaming, prompt caching, structured output, and model feature suffixes.

Instalar en Claude Code
Copiar
git clone --depth 1 https://github.com/veniceai/skills /tmp/venice-chat && cp -r /tmp/venice-chat/skills/venice-chat ~/.claude/skills/venice-chat
Después abre una sesión nueva de Claude Code; el skill carga automáticamente.

SKILL.md

# Venice Chat Completions

`POST /api/v1/chat/completions` is Venice's main text endpoint. It's OpenAI-compatible, plus a `venice_parameters` object for Venice-only features.

## Use when

- You need LLM text generation, with or without tools, with or without streaming.
- You want multimodal inputs (images, audio, video) to a vision/audio-capable model.
- You want Venice-specific features: web search, E2EE, characters, xAI X/Twitter search, strip-thinking, web scraping.
- You need prompt caching for large system prompts or long documents.
- You need structured (`json_schema`) output.

For the newer Alpha **Responses API**, see [`venice-responses`](../venice-responses/SKILL.md).

## Minimal request

```bash
curl https://api.venice.ai/api/v1/chat/completions \
  -H "Authorization: Bearer $VENICE_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "zai-org-glm-5-1",
    "messages": [{"role": "user", "content": "Why is the sky blue?"}]
  }'
```

Response shape is the standard OpenAI `chat.completion` object (`id`, `object: "chat.completion"`, `choices[].message`, `usage`). With `stream: true`, responses come as SSE `data:` lines in `chat.completion.chunk` format.

## The request body

### Core fields (OpenAI-compatible)

| Field | Notes |
|---|---|
| `model` | string — model ID, trait name, or compatibility mapping. Suffixes allowed (see below). Required. |
| `messages` | array of `system` / `developer` / `user` / `assistant` / `tool` messages. Required, min 1. |
| `temperature`, `top_p`, `top_k`, `min_p`, `min_temp`, `max_temp` | sampling controls |
| `repetition_penalty`, `frequency_penalty`, `presence_penalty` | repetition controls |
| `max_tokens` *(deprecated)* / `max_completion_tokens` | upper bound on output tokens |
| `n` | number of choices (keep `1` to minimize cost) |
| `seed` | integer for reproducibility |
| `stop` / `stop_token_ids` | up to 4 strings, or raw token IDs |
| `stream`, `stream_options.include_usage` | SSE streaming + include usage in the final chunk |
| `response_format` | `{type:"json_schema", json_schema:{...}}` (preferred), `{type:"json_object"}`, or `{type:"text"}` |
| `tools`, `tool_choice`, `parallel_tool_calls` | function calling / built-in tools |
| `logprobs`, `top_logprobs` | return token log-probabilities |
| `reasoning.effort` / `reasoning_effort` | `none` \| `minimal` \| `low` \| `medium` \| `high` \| `xhigh` \| `max` |
| `reasoning.summary` | `auto` \| `concise` \| `detailed` |
| `prompt_cache_key`, `prompt_cache_retention` (`default`/`extended`/`24h`) | prompt caching hints |
| `text.verbosity` | `low`/`medium`/`high`/`auto` |
| `metadata` | key/value strings for tracking |
| `user`, `store` | accepted but ignored (OpenAI compat) |

### `venice_parameters` (Venice-only)

All optional. Combined with model feature suffixes, these are how you enable Venice features.

| Field | Type | Default | Effect |
|---|---|---|---|
| `character_slug` | string | — | Apply a published Venice character. Slug is the "Public ID" on the character page. See [`venice-characters`](../venice-characters/SKILL.md). |
| `strip_thinking_response` | bool | `false` | Strip `<think>...</think>` from the assistant output on reasoning models. |
| `disable_thinking` | bool | `false` | Disable thinking entirely on supported reasoning models and strip tags. |
| `enable_e2ee` | bool | `true` | End-to-end encryption on E2EE-capable models when E2EE headers are present. Set to `false` to force TEE-only. |
| `enable_web_search` | `"off"`/`"auto"`/`"on"` | `"off"` | Venice server-side web search. Citations arrive in the first streamed chunk or the response. |
| `enable_web_scraping` | bool | `false` | Scrape any URLs found in the last user message (Firecrawl). |
| `enable_web_citations` | bool | `false` | Ask the LLM to cite sources with `^1^` / `^1,3^` superscripts. |
| `include_search_results_in_stream` | bool | `false` | Experimental — emit search results as the first stream chunk. |
| `return_search_results_as_documents` | bool | — | Also surface search results as a synthetic tool call `venice_web_search_documents` (LangChain-friendly). |
| `include_venice_system_prompt` | bool | `true` | Prepend Venice's curated system prompt. Turn off for full control. |
| `enable_x_search` | bool | `false` | xAI native web + X/Twitter search (Grok models with `supportsXSearch`). Adds ~$0.01/search. |

### Model feature suffixes

Some `venice_parameters` can also be expressed as **model feature suffixes** on the `model` string — useful when the caller/library (OpenAI SDK, LangChain) can't set `venice_parameters`. Syntax:

```
<model-id>:<key>=<value>[&<key>=<value>…]
```

Values are URL-decoded. Supported keys (exact match):

| Key | Type | Maps to |
|---|---|---|
| `enable_web_search` | `on` / `off` / `auto` | `venice_parameters.enable_web_search` |
| `enable_web_citations` | `"true"` / `"false"` | `venice_parameters.enable_web_citations` |
| `enable_web_scraping` | `"true"` / `"false"` | `venice_parameters.enable_web_scraping` |
| `include_venice_system_prompt` | `"true"` / `"false"` | `venice_parameters.include_venice_system_prompt` |
| `include_search_results_in_stream` | `"true"` / `"false"` | `venice_parameters.include_search_results_in_stream` |
| `return_search_results_as_documents` | `"true"` / `"false"` | `venice_parameters.return_search_results_as_documents` |
| `character_slug` | string | `venice_parameters.character_slug` |
| `strip_thinking_response` | `"true"` / `"false"` | `venice_parameters.strip_thinking_response` |
| `disable_thinking` | `"true"` / `"false"` | `venice_parameters.disable_thinking` |

Unknown keys are silently ignored. Examples:

```
zai-org-glm-5-1:enable_web_search=on
kimi-k2-6:strip_thinking_response=true&enable_web_search=auto
zai-org-glm-5-1:character_slug=alan-watts
```

Note: `enable_e2ee` and `enable_x_search` can **only** be set via `venice_parameters`, not as suffixes.

## Messages and modalities

`messages[].content` is either a string o
venice-api-keysSkill

Manage Venice API keys. Covers GET/POST/PATCH/DELETE /api_keys, GET /api_keys/{id}, GET /api_keys/rate_limits, GET /api_keys/rate_limits/log, the two-step /api_keys/generate_web3_key wallet flow, INFERENCE vs ADMIN key types, and per-key consumption limits (USD / DIEM).

venice-api-overviewSkill

High-level map of the Venice.ai API - base URL, authentication modes, endpoint categories, response headers, pricing model, error shape, and versioning. Load this first when starting any Venice integration.

venice-audio-musicSkill

Async music / audio-track generation via Venice. Covers the /audio/quote + /audio/queue + /audio/retrieve + /audio/complete lifecycle, lyrics vs instrumental, voice selection, duration, language, speed, model capability probing, and webhook-free polling.

venice-audio-speechSkill

Generate speech from text via POST /audio/speech. Covers TTS models (Kokoro, Qwen 3, xAI, Inworld, Chatterbox, Orpheus, ElevenLabs Turbo, MiniMax, Gemini Flash), voices per family, output formats (mp3/opus/aac/flac/wav/pcm), streaming, prompt/emotion styling, temperature/top_p, and language hints.

venice-audio-transcriptionSkill

Transcribe audio files to text via POST /audio/transcriptions. Covers supported models (Parakeet, Whisper, Wizper, Scribe, xAI STT), supported formats (wav/flac/m4a/aac/mp4/mp3/ogg/webm), response formats (json/text), timestamps, and language hints. OpenAI-compatible multipart.

venice-augmentSkill

Venice augmentation endpoints for agent pipelines. Covers POST /augment/text-parser (extract text from PDF/DOCX/XLSX/plain text, multipart, up to 25MB, JSON or plain text response), POST /augment/scrape (fetch a URL and return markdown; blocks X/Reddit), and POST /augment/search (Brave ZDR or anonymized Google; structured title/url/content/date results, up to 20 per query). Privacy (zero data retention), rate limits, and error shapes.

venice-authSkill

Authenticate to the Venice API with a Bearer API key or with an x402 / SIWE wallet. Covers header formats, the SIWE message fields, TTL and nonce rules, the venice-x402-client SDK, and how to choose between the two modes.

venice-billingSkill

Venice billing and usage analytics - GET /billing/balance, GET /billing/usage (paginated per-request ledger, JSON or CSV), and GET /billing/usage-analytics (aggregated by date/model/key). Covers the DIEM/USD/BUNDLED_CREDITS consumption priority and building dashboards. (Beta)