Skip to main content
ClaudeWave
Skill92 estrellas del repoactualizado 1mo ago

venice-responses

Use Venice's Alpha POST /responses endpoint - an OpenAI-compatible Responses API with typed output blocks (reasoning, message, function_call, web_search_call). Covers request shape, streaming, differences from /chat/completions, supported venice_parameters subset, and E2EE behavior.

Instalar en Claude Code
Copiar
git clone --depth 1 https://github.com/veniceai/skills /tmp/venice-responses && cp -r /tmp/venice-responses/skills/venice-responses ~/.claude/skills/venice-responses
Después abre una sesión nueva de Claude Code; el skill carga automáticamente.

SKILL.md

# Venice Responses API (Alpha)

`POST /api/v1/responses` is Venice's OpenAI-compatible Responses endpoint. It returns a **structured, typed output array** instead of a single `message.content` string — ideal for agents that need to separate reasoning, messages, tool calls, and built-in tool events.

> **Alpha.** Access is gated behind the `responsesApiEnabled` flag on Bearer API keys (staff-only during beta). x402 wallet auth bypasses this flag — you can pay per request without the flag. Schemas may change.

## Use when

- You need the OpenAI Responses-style response shape (`output[]` with typed `type: "reasoning" | "message" | "function_call" | "web_search_call"` blocks) for a client library that expects it.
- You want clean separation of reasoning vs message vs tool-call output.
- You want streaming via SSE with typed events.

Otherwise use [`venice-chat`](../venice-chat/SKILL.md) — it has more features, more models, and full Venice parameters.

## Limitations vs `/chat/completions`

| Limitation | Detail |
|---|---|
| **Stateless** | No conversation persistence across requests. Send the full history each call. |
| **E2EE models default to rejection** | E2EE-capable models return `400` unless you pass `venice_parameters.enable_e2ee: false` (TEE-only mode). For end-to-end encrypted inference with E2EE headers, use `/chat/completions`. |
| **Subset of `venice_parameters`** | `character_slug`, `enable_e2ee`, `enable_web_search`, `enable_web_scraping`, `enable_web_citations`, `include_venice_system_prompt`, `include_search_results_in_stream` are supported. `strip_thinking_response`, `disable_thinking`, `enable_x_search` are **not** wired through in Alpha. |
| **Access gated by feature flag** | Bearer keys without `responsesApiEnabled` get `401`. x402 requests are allowed (pay-per-call). |

## Authentication

Same as the rest of the API — either `Authorization: Bearer <key>` or `X-Sign-In-With-X: <SIWE>`. See [`venice-auth`](../venice-auth/SKILL.md).

## Minimal request

```bash
curl https://api.venice.ai/api/v1/responses \
  -H "Authorization: Bearer $VENICE_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "zai-org-glm-5-1",
    "input": "Explain why the sky is blue in one paragraph."
  }'
```

`input` accepts:
- a plain string, or
- an array of typed input items (similar to `chat/completions` message parts) for multi-turn or multimodal history.

## Response shape

```json
{
  "id": "resp_abc123",
  "object": "response",
  "created_at": 1735689600,
  "model": "zai-org-glm-5-1",
  "status": "completed",
  "output": [
    {
      "type": "reasoning",
      "id": "rs_1",
      "summary": ["I considered Rayleigh scattering..."],
      "encrypted_content": "..."
    },
    {
      "type": "message",
      "id": "msg_1",
      "status": "completed",
      "role": "assistant",
      "content": [{
        "type": "output_text",
        "text": "The sky is blue because...",
        "annotations": [{
          "type": "url_citation",
          "url": "https://example.com/rayleigh",
          "title": "Rayleigh scattering",
          "start_index": 42,
          "end_index": 99
        }]
      }]
    },
    {
      "type": "function_call",
      "id": "fc_1",
      "call_id": "call_abc",
      "name": "get_weather",
      "arguments": "{\"city\":\"Paris\"}",
      "status": "completed"
    },
    {
      "type": "web_search_call",
      "id": "ws_1",
      "status": "completed"
    }
  ],
  "usage": {
    "input_tokens": 20,
    "input_tokens_details": {"cached_tokens": 0},
    "output_tokens": 80,
    "output_tokens_details": {"reasoning_tokens": 40},
    "total_tokens": 100
  }
}
```

Top-level `status` ∈ `completed` | `failed` | `in_progress` | `cancelled`. On `failed`, `error.code` and `error.message` are populated.

## Output block types

| `type` | Purpose |
|---|---|
| `reasoning` | Thought process from reasoning models. `summary[]` holds human-readable text; `encrypted_content` holds opaque signatures — round-trip verbatim for multi-turn tool calls. |
| `message` | Main text output. `content[].type === "output_text"`, plus `annotations[]` for `url_citation` entries from web search. |
| `function_call` | Tool call: `name`, stringified-JSON `arguments`, `call_id`. |
| `web_search_call` | Sentinel showing the built-in web_search tool fired; use alongside `url_citation` annotations on messages. |

Match tool outputs back by `call_id` when continuing the turn.

## Common request fields

| Field | Notes |
|---|---|
| `model` | Required. Model ID, trait, or compatibility mapping. Feature suffixes allowed (see [`venice-chat`](../venice-chat/SKILL.md#model-feature-suffixes)). |
| `input` | Required. String or input-items array. To set system/developer context, include a leading message with `role: "system"`/`"developer"` in the input array. |
| `tools` | Array of `{type:"function",function:{...}}` or built-in `{type:"web_search"}` — availability depends on the model. |
| `tool_choice` | `"auto"` / `"required"` / `"none"` / `{type:"function",function:{"name":"..."}}`. |
| `reasoning.effort` | Reasoning effort hint for thinking models (`"low"` \| `"medium"` \| `"high"`). |
| `temperature`, `top_p`, `max_output_tokens`, `n`, `stop`, `seed`, `prompt_cache_key` | Standard generation controls — translated to `/chat/completions` equivalents server-side. |
| `stream` | Boolean. SSE response with typed events (`response.created`, `response.output_item.added`, `response.output_text.delta`, `response.completed`, …). |
| `venice_parameters` | Subset listed above. Example: `{"character_slug":"alan-watts","enable_web_search":"on"}`. |

Fields commonly found in OpenAI's Responses API that are **not** in Venice's Alpha schema (and silently ignored or rejected by Zod): `instructions`, `metadata`, `parallel_tool_calls`, `response_format`, `store`, `previous_response_id`, `background`. For `response_format` / JSON-schema structured output, use `/chat/completions`.

## Streaming

With `stream: tr
venice-api-keysSkill

Manage Venice API keys. Covers GET/POST/PATCH/DELETE /api_keys, GET /api_keys/{id}, GET /api_keys/rate_limits, GET /api_keys/rate_limits/log, the two-step /api_keys/generate_web3_key wallet flow, INFERENCE vs ADMIN key types, and per-key consumption limits (USD / DIEM).

venice-api-overviewSkill

High-level map of the Venice.ai API - base URL, authentication modes, endpoint categories, response headers, pricing model, error shape, and versioning. Load this first when starting any Venice integration.

venice-audio-musicSkill

Async music / audio-track generation via Venice. Covers the /audio/quote + /audio/queue + /audio/retrieve + /audio/complete lifecycle, lyrics vs instrumental, voice selection, duration, language, speed, model capability probing, and webhook-free polling.

venice-audio-speechSkill

Generate speech from text via POST /audio/speech. Covers TTS models (Kokoro, Qwen 3, xAI, Inworld, Chatterbox, Orpheus, ElevenLabs Turbo, MiniMax, Gemini Flash), voices per family, output formats (mp3/opus/aac/flac/wav/pcm), streaming, prompt/emotion styling, temperature/top_p, and language hints.

venice-audio-transcriptionSkill

Transcribe audio files to text via POST /audio/transcriptions. Covers supported models (Parakeet, Whisper, Wizper, Scribe, xAI STT), supported formats (wav/flac/m4a/aac/mp4/mp3/ogg/webm), response formats (json/text), timestamps, and language hints. OpenAI-compatible multipart.

venice-augmentSkill

Venice augmentation endpoints for agent pipelines. Covers POST /augment/text-parser (extract text from PDF/DOCX/XLSX/plain text, multipart, up to 25MB, JSON or plain text response), POST /augment/scrape (fetch a URL and return markdown; blocks X/Reddit), and POST /augment/search (Brave ZDR or anonymized Google; structured title/url/content/date results, up to 20 per query). Privacy (zero data retention), rate limits, and error shapes.

venice-authSkill

Authenticate to the Venice API with a Bearer API key or with an x402 / SIWE wallet. Covers header formats, the SIWE message fields, TTL and nonce rules, the venice-x402-client SDK, and how to choose between the two modes.

venice-billingSkill

Venice billing and usage analytics - GET /billing/balance, GET /billing/usage (paginated per-request ledger, JSON or CSV), and GET /billing/usage-analytics (aggregated by date/model/key). Covers the DIEM/USD/BUNDLED_CREDITS consumption priority and building dashboards. (Beta)