omni-resilience
OmniRoute's omni-resilience skill monitors real-time health metrics and failure states across multiple AI provider backends. It exposes endpoints to inspect circuit-breaker status, latency percentiles (p50/p95/p99), budget alerts, model lockouts, and connection cooldowns via authenticated REST API or MCP protocol calls. Use this skill to diagnose provider outages, track performance degradation, and ensure request routing reliability in production multi-provider systems.
git clone --depth 1 https://github.com/diegosouzapw/OmniRoute /tmp/omni-resilience && cp -r /tmp/omni-resilience/skills/omni-resilience ~/.claude/skills/omni-resilienceSKILL.md
<!-- generated by src/lib/agentSkills/generator.ts; manual edits will be overwritten -->
## Overview
Monitor provider health, circuit-breaker states, p50/p95/p99 latency metrics, and budget guard alerts. Inspect connection cooldowns and model lockouts in real time.
## Authentication
All requests require a valid Bearer token or session cookie. Obtain a token via `POST /api/auth/login` or configure `REQUIRE_API_KEY=false` for local development.
## Endpoints
### GET /api/monitoring/health
System health check
Returns system health including uptime, memory, circuit breakers, rate limits
```bash
curl https://localhost:20128/api/monitoring/health \
-H "Authorization: Bearer $OMNIROUTE_TOKEN"
```
## Payloads
See the full OpenAPI specification at `GET /api/openapi/spec` or `docs/reference/openapi.yaml` for detailed request/response schemas.
<!-- skill:custom-start -->
<!-- Migrated from skills/omniroute-monitoring/SKILL.md (preserved curated content) -->
# OmniRoute — Monitoring & Health
Requires `OMNIROUTE_URL` and `OMNIROUTE_KEY`. See [entry-point SKILL](https://raw.githubusercontent.com/diegosouzapw/OmniRoute/main/skills/omniroute/SKILL.md) for setup.
## System health
```bash
curl $OMNIROUTE_URL/api/health \
-H "Authorization: Bearer $OMNIROUTE_KEY"
```
Returns: uptime, memory, active connections, circuit breaker states, rate limit status, cache stats.
Unauthenticated quick check:
```bash
curl $OMNIROUTE_URL/api/health
# → {"ok":true}
```
## Provider circuit breakers
Circuit breakers prevent traffic from hitting failing providers.
States: `CLOSED` (normal), `OPEN` (blocked), `HALF_OPEN` (probe mode — auto-recovers).
```bash
curl $OMNIROUTE_URL/api/monitoring/health \
-H "Authorization: Bearer $OMNIROUTE_KEY"
```
Response includes `circuitBreakers` array with per-provider state and `resetAt` timestamp.
## Per-provider metrics (p50/p95/p99)
```bash
curl $OMNIROUTE_URL/api/providers/metrics \
-H "Authorization: Bearer $OMNIROUTE_KEY"
```
Response shape per provider:
```json
{
"provider": "anthropic",
"requests": 1247,
"successRate": 0.994,
"latency": { "p50": 820, "p95": 2100, "p99": 3800 },
"circuitState": "CLOSED",
"tokensUsed": 2847000
}
```
## Via MCP (if OmniRoute is your MCP server)
```
omniroute_get_health → full system health snapshot
omniroute_get_provider_metrics → p50/p95/p99 + circuit state per provider
omniroute_get_session_snapshot → cost, tokens, errors for current session
omniroute_check_quota → quota balance + percent remaining + reset time
omniroute_db_health_check → diagnose + auto-repair database drift
```
## Quota check
```bash
curl $OMNIROUTE_URL/api/quota \
-H "Authorization: Bearer $OMNIROUTE_KEY"
```
Returns used/total tokens and requests per provider/account, with `resetAt` timestamps.
## Budget guard (spend limit)
Set a session spending limit that degrades or blocks requests when hit:
```bash
curl -X POST $OMNIROUTE_URL/api/budget/guard \
-H "Authorization: Bearer $OMNIROUTE_KEY" \
-H "Content-Type: application/json" \
-d '{
"limitUsd": 5.00,
"action": "degrade",
"degradeTo": "openai/gpt-4o-mini"
}'
```
`action` options:
- `degrade` — switch to a cheaper model when limit is hit
- `block` — return 429 when limit is hit
- `alert` — continue but add `X-Budget-Warning` header
## MCP audit log
OmniRoute logs every MCP tool call to `mcp_audit` table. Query via API:
```bash
curl "$OMNIROUTE_URL/api/mcp/status" \
-H "Authorization: Bearer $OMNIROUTE_KEY"
```
Returns: server status, heartbeat, recent audit activity summary.
## Errors
- `503` on health endpoint → OmniRoute is starting up; retry in 5s
- Circuit breaker `OPEN` → provider is temporarily blocked; check `resetAt` to know when it auto-recovers
- `429 budget_exceeded` → budget guard limit reached; raise limit or wait for reset
<!-- skill:custom-end -->Interact with the OmniRoute A2A server from the CLI. Send tasks, inspect skill execution history, and test the JSON-RPC 2.0 agent-to-agent protocol interactively.
Backup and restore OmniRoute data from the CLI. Trigger incremental snapshots, sync to cloud storage, manage backup schedules, and restore from archive files.
Submit and monitor batch inference jobs from the CLI. Upload and manage files for batch processing, retrieve results, and integrate batch pipelines with CI/CD workflows.
Send chat completions, stream responses, and start an interactive REPL session from the CLI. Supports all OmniRoute providers, combo routing, and system prompt configuration.
Configure and test prompt compression from the CLI. Manage RTK filters, Caveman rules, stacked compression modes, and preview compression output with real prompts.
Manage context engineering configurations, RTK filter sets, and conversation sessions from the CLI. Apply context-relay settings and inspect active context pipelines.
View cost breakdowns, token usage, and call logs from the CLI. Filter by provider, model, or date range. Export usage reports and inspect per-connection spending.
Create and run evaluation suites, watch live benchmark progress, view scorecards, compare model performance, and integrate eval runs with CI workflows from the CLI.