omni-inference
Omni-inference provides OpenAI-compatible REST endpoints for chat completions, embeddings, images, audio (text-to-speech and speech-to-text), moderation, reranking, and the Responses API. Use this skill when building AI agents that need unified access to multiple AI model providers through standardized API interfaces with bearer token authentication.
git clone --depth 1 https://github.com/diegosouzapw/OmniRoute /tmp/omni-inference && cp -r /tmp/omni-inference/skills/omni-inference ~/.claude/skills/omni-inferenceSKILL.md
<!-- generated by src/lib/agentSkills/generator.ts; manual edits will be overwritten -->
## Overview
The core OpenAI-compatible inference endpoints: chat completions, embeddings, images, audio (TTS/STT), moderations, rerank, and the Responses API. The primary integration surface for AI agents.
## Authentication
All requests require a valid Bearer token or session cookie. Obtain a token via `POST /api/auth/login` or configure `REQUIRE_API_KEY=false` for local development.
## Endpoints
### POST /api/v1/chat/completions
Create chat completion
OpenAI-compatible chat completions endpoint. Routes to configured providers.
```bash
curl -X POST https://localhost:20128/api/v1/chat/completions \
-H "Authorization: Bearer $OMNIROUTE_TOKEN"
-H "Content-Type: application/json" \
-d '{}'
```
### POST /api/v1/providers/{provider}/chat/completions
Create chat completion (provider-specific)
Routes to a specific provider by name.
```bash
curl -X POST https://localhost:20128/api/v1/providers/{provider}/chat/completions \
-H "Authorization: Bearer $OMNIROUTE_TOKEN"
-H "Content-Type: application/json" \
-d '{}'
```
### POST /api/v1/api/chat
Ollama-compatible chat endpoint
Provides compatibility with Ollama's /api/chat format.
```bash
curl -X POST https://localhost:20128/api/v1/api/chat \
-H "Authorization: Bearer $OMNIROUTE_TOKEN"
-H "Content-Type: application/json" \
-d '{}'
```
### POST /api/v1/messages
Create message (Anthropic-compatible)
Anthropic Messages API endpoint. Routes to Claude providers.
```bash
curl -X POST https://localhost:20128/api/v1/messages \
-H "Authorization: Bearer $OMNIROUTE_TOKEN"
-H "Content-Type: application/json" \
-d '{}'
```
### POST /api/v1/messages/count_tokens
Count tokens for a message
```bash
curl -X POST https://localhost:20128/api/v1/messages/count_tokens \
-H "Authorization: Bearer $OMNIROUTE_TOKEN"
-H "Content-Type: application/json" \
-d '{}'
```
### POST /api/v1/responses
Create response (OpenAI Responses API)
OpenAI Responses API endpoint.
```bash
curl -X POST https://localhost:20128/api/v1/responses \
-H "Authorization: Bearer $OMNIROUTE_TOKEN"
-H "Content-Type: application/json" \
-d '{}'
```
### POST /api/v1/embeddings
Create embeddings
```bash
curl -X POST https://localhost:20128/api/v1/embeddings \
-H "Authorization: Bearer $OMNIROUTE_TOKEN"
-H "Content-Type: application/json" \
-d '{}'
```
### POST /api/v1/providers/{provider}/embeddings
Create embeddings (provider-specific)
```bash
curl -X POST https://localhost:20128/api/v1/providers/{provider}/embeddings \
-H "Authorization: Bearer $OMNIROUTE_TOKEN"
-H "Content-Type: application/json" \
-d '{}'
```
### POST /api/v1/images/generations
Generate images
```bash
curl -X POST https://localhost:20128/api/v1/images/generations \
-H "Authorization: Bearer $OMNIROUTE_TOKEN"
-H "Content-Type: application/json" \
-d '{}'
```
### POST /api/v1/providers/{provider}/images/generations
Generate images (provider-specific)
```bash
curl -X POST https://localhost:20128/api/v1/providers/{provider}/images/generations \
-H "Authorization: Bearer $OMNIROUTE_TOKEN"
-H "Content-Type: application/json" \
-d '{}'
```
### POST /api/v1/audio/speech
Generate speech audio
Text-to-speech endpoint. Routes to configured TTS providers.
```bash
curl -X POST https://localhost:20128/api/v1/audio/speech \
-H "Authorization: Bearer $OMNIROUTE_TOKEN"
-H "Content-Type: application/json" \
-d '{}'
```
### POST /api/v1/audio/transcriptions
Transcribe audio
Audio-to-text transcription endpoint.
```bash
curl -X POST https://localhost:20128/api/v1/audio/transcriptions \
-H "Authorization: Bearer $OMNIROUTE_TOKEN"
-H "Content-Type: application/json" \
-d '{}'
```
### POST /api/v1/moderations
Create moderation
Content moderation endpoint. Routes to configured moderation providers.
```bash
curl -X POST https://localhost:20128/api/v1/moderations \
-H "Authorization: Bearer $OMNIROUTE_TOKEN"
-H "Content-Type: application/json" \
-d '{}'
```
### POST /api/v1/rerank
Rerank documents
Document reranking endpoint.
```bash
curl -X POST https://localhost:20128/api/v1/rerank \
-H "Authorization: Bearer $OMNIROUTE_TOKEN"
-H "Content-Type: application/json" \
-d '{}'
```
### GET /api/v1
API v1 root endpoint
Returns basic API info and status.
```bash
curl https://localhost:20128/api/v1 \
-H "Authorization: Bearer $OMNIROUTE_TOKEN"
```
### GET /api/v1/providers/{provider}/models
List models for a specific provider
Returns only models for the selected provider with provider prefix removed from each model id.
```bash
curl https://localhost:20128/api/v1/providers/{provider}/models \
-H "Authorization: Bearer $OMNIROUTE_TOKEN"
```
## Payloads
See the full OpenAPI specification at `GET /api/openapi/spec` or `docs/reference/openapi.yaml` for detailed request/response schemas.
<!-- skill:custom-start -->
<!-- Aggregated from: omniroute-chat, omniroute-image, omniroute-tts, omniroute-stt, omniroute-embeddings, omniroute-web-search, omniroute-web-fetch -->
## Chat completions
Requires `OMNIROUTE_URL` and `OMNIROUTE_KEY`. See [entry-point SKILL](https://raw.githubusercontent.com/diegosouzapw/OmniRoute/main/skills/omniroute/SKILL.md) for setup.
### Endpoints
- `POST $OMNIROUTE_URL/v1/chat/completions` — OpenAI format
- `POST $OMNIROUTE_URL/v1/messages` — Anthropic Messages format
- `POST $OMNIROUTE_URL/v1/responses` — OpenAI Responses API
### Discover
```bash
curl $OMNIROUTE_URL/v1/models | jq '.data[].id'
```
Combos (e.g. `auto`, `cost-optimized`, `subscription`) auto-fallback through multiple providers.
### OpenAI format example
```bash
curl -X POST $OMNIROUTE_URL/v1/chat/completions \
-H "Authorization: Bearer $OMNIROUTE_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "claude-opus-4-7",
"messages": [{"role": "user", "content": "Refactor this function"}],
"stream": true
}'
```
### AnthroInteract with the OmniRoute A2A server from the CLI. Send tasks, inspect skill execution history, and test the JSON-RPC 2.0 agent-to-agent protocol interactively.
Backup and restore OmniRoute data from the CLI. Trigger incremental snapshots, sync to cloud storage, manage backup schedules, and restore from archive files.
Submit and monitor batch inference jobs from the CLI. Upload and manage files for batch processing, retrieve results, and integrate batch pipelines with CI/CD workflows.
Send chat completions, stream responses, and start an interactive REPL session from the CLI. Supports all OmniRoute providers, combo routing, and system prompt configuration.
Configure and test prompt compression from the CLI. Manage RTK filters, Caveman rules, stacked compression modes, and preview compression output with real prompts.
Manage context engineering configurations, RTK filter sets, and conversation sessions from the CLI. Apply context-relay settings and inspect active context pipelines.
View cost breakdowns, token usage, and call logs from the CLI. Filter by provider, model, or date range. Export usage reports and inspect per-connection spending.
Create and run evaluation suites, watch live benchmark progress, view scorecards, compare model performance, and integrate eval runs with CI workflows from the CLI.