Skip to main content
ClaudeWave
Skill6.1k estrellas del repoactualizado today

omni-resilience

OmniRoute's omni-resilience skill monitors real-time health metrics and failure states across multiple AI provider backends. It exposes endpoints to inspect circuit-breaker status, latency percentiles (p50/p95/p99), budget alerts, model lockouts, and connection cooldowns via authenticated REST API or MCP protocol calls. Use this skill to diagnose provider outages, track performance degradation, and ensure request routing reliability in production multi-provider systems.

Instalar en Claude Code
Copiar
git clone --depth 1 https://github.com/diegosouzapw/OmniRoute /tmp/omni-resilience && cp -r /tmp/omni-resilience/skills/omni-resilience ~/.claude/skills/omni-resilience
Después abre una sesión nueva de Claude Code; el skill carga automáticamente.

SKILL.md

<!-- generated by src/lib/agentSkills/generator.ts; manual edits will be overwritten -->

## Overview

Monitor provider health, circuit-breaker states, p50/p95/p99 latency metrics, and budget guard alerts. Inspect connection cooldowns and model lockouts in real time.

## Authentication

All requests require a valid Bearer token or session cookie. Obtain a token via `POST /api/auth/login` or configure `REQUIRE_API_KEY=false` for local development.

## Endpoints

### GET /api/monitoring/health

System health check

Returns system health including uptime, memory, circuit breakers, rate limits

```bash
curl https://localhost:20128/api/monitoring/health \
  -H "Authorization: Bearer $OMNIROUTE_TOKEN"
```

## Payloads

See the full OpenAPI specification at `GET /api/openapi/spec` or `docs/reference/openapi.yaml` for detailed request/response schemas.

<!-- skill:custom-start -->
<!-- Migrated from skills/omniroute-monitoring/SKILL.md (preserved curated content) -->

# OmniRoute — Monitoring & Health

Requires `OMNIROUTE_URL` and `OMNIROUTE_KEY`. See [entry-point SKILL](https://raw.githubusercontent.com/diegosouzapw/OmniRoute/main/skills/omniroute/SKILL.md) for setup.

## System health

```bash
curl $OMNIROUTE_URL/api/health \
  -H "Authorization: Bearer $OMNIROUTE_KEY"
```

Returns: uptime, memory, active connections, circuit breaker states, rate limit status, cache stats.

Unauthenticated quick check:

```bash
curl $OMNIROUTE_URL/api/health
# → {"ok":true}
```

## Provider circuit breakers

Circuit breakers prevent traffic from hitting failing providers.

States: `CLOSED` (normal), `OPEN` (blocked), `HALF_OPEN` (probe mode — auto-recovers).

```bash
curl $OMNIROUTE_URL/api/monitoring/health \
  -H "Authorization: Bearer $OMNIROUTE_KEY"
```

Response includes `circuitBreakers` array with per-provider state and `resetAt` timestamp.

## Per-provider metrics (p50/p95/p99)

```bash
curl $OMNIROUTE_URL/api/providers/metrics \
  -H "Authorization: Bearer $OMNIROUTE_KEY"
```

Response shape per provider:

```json
{
  "provider": "anthropic",
  "requests": 1247,
  "successRate": 0.994,
  "latency": { "p50": 820, "p95": 2100, "p99": 3800 },
  "circuitState": "CLOSED",
  "tokensUsed": 2847000
}
```

## Via MCP (if OmniRoute is your MCP server)

```
omniroute_get_health            → full system health snapshot
omniroute_get_provider_metrics  → p50/p95/p99 + circuit state per provider
omniroute_get_session_snapshot  → cost, tokens, errors for current session
omniroute_check_quota           → quota balance + percent remaining + reset time
omniroute_db_health_check       → diagnose + auto-repair database drift
```

## Quota check

```bash
curl $OMNIROUTE_URL/api/quota \
  -H "Authorization: Bearer $OMNIROUTE_KEY"
```

Returns used/total tokens and requests per provider/account, with `resetAt` timestamps.

## Budget guard (spend limit)

Set a session spending limit that degrades or blocks requests when hit:

```bash
curl -X POST $OMNIROUTE_URL/api/budget/guard \
  -H "Authorization: Bearer $OMNIROUTE_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "limitUsd": 5.00,
    "action": "degrade",
    "degradeTo": "openai/gpt-4o-mini"
  }'
```

`action` options:

- `degrade` — switch to a cheaper model when limit is hit
- `block` — return 429 when limit is hit
- `alert` — continue but add `X-Budget-Warning` header

## MCP audit log

OmniRoute logs every MCP tool call to `mcp_audit` table. Query via API:

```bash
curl "$OMNIROUTE_URL/api/mcp/status" \
  -H "Authorization: Bearer $OMNIROUTE_KEY"
```

Returns: server status, heartbeat, recent audit activity summary.

## Errors

- `503` on health endpoint → OmniRoute is starting up; retry in 5s
- Circuit breaker `OPEN` → provider is temporarily blocked; check `resetAt` to know when it auto-recovers
- `429 budget_exceeded` → budget guard limit reached; raise limit or wait for reset
<!-- skill:custom-end -->