Skill70 repo starsupdated 2mo ago

mcp-health-checker

MCP Health Checker proactively monitors Model Context Protocol server connections by pinging configured servers, measuring latency, detecting stale connections, and verifying tool availability. Use it automatically every six hours via cron, manually before launching tasks dependent on MCP tools, after unexpected tool failures to diagnose connection issues, or following MCP server restarts to confirm restoration. The skill performs six distinct health checks including reachability, latency thresholds, connection age, tool counts, configuration validity, and authentication token expiry, outputting results in human or machine-readable formats.

View source Repository: openclaw-superpowers

Install in Claude Code

Copy

git clone --depth 1 https://github.com/ArchieIndian/openclaw-superpowers /tmp/mcp-health-checker && cp -r /tmp/mcp-health-checker/skills/openclaw-native/mcp-health-checker ~/.claude/skills/mcp-health-checker

Then start a new Claude Code session; the skill loads automatically.

Definition

SKILL.md

# MCP Health Checker

## What it does

MCP (Model Context Protocol) servers are how OpenClaw connects to external tools — but connections go stale silently. A crashed MCP server doesn't throw an error until the agent tries to use it, causing confusing mid-task failures.

MCP Health Checker proactively monitors all configured MCP connections. It pings servers, measures latency, tracks uptime history, and alerts you before a stale connection causes a problem.

Inspired by OpenLobster's MCP connection health monitoring and OAuth 2.1+PKCE token refresh tracking.

## When to invoke

- Automatically every 6 hours (cron) — silent background health check
- Manually before starting a task that depends on MCP tools
- When an MCP tool call fails unexpectedly — diagnose the connection
- After restarting MCP servers — verify all connections restored

## Health checks performed

| Check | What it tests | Severity on failure |
|---|---|---|
| REACHABLE | Server responds to connection probe | CRITICAL |
| LATENCY | Response time under threshold (default: 5s) | HIGH |
| STALE | Connection age exceeds max (default: 24h) | HIGH |
| TOOL_COUNT | Server exposes expected number of tools | MEDIUM |
| CONFIG_VALID | MCP config entry has required fields | MEDIUM |
| AUTH_EXPIRY | OAuth/API token approaching expiration | HIGH |

## How to use

```bash
python3 check.py --ping                     # Ping all configured MCP servers
python3 check.py --ping --server <name>     # Ping a specific server
python3 check.py --ping --timeout 3         # Custom timeout in seconds
python3 check.py --status                   # Last check summary from state
python3 check.py --history                  # Show past check results
python3 check.py --config                   # Validate MCP config entries
python3 check.py --format json              # Machine-readable output
```

## Cron wakeup behaviour

Every 6 hours:

1. Read MCP server configuration from `~/.openclaw/config/` (YAML/JSON)
2. For each configured server:
   - Attempt connection probe (TCP or HTTP depending on transport)
   - Measure response latency
   - Check connection age against staleness threshold
   - Verify tool listing matches expected count (if tracked)
   - Check auth token expiry (if applicable)
3. Update state with per-server health records
4. Print summary: healthy / degraded / unreachable counts
5. Exit 1 if any CRITICAL findings

## Procedure

**Step 1 — Run a health check**

```bash
python3 check.py --ping
```

Review the output. Healthy servers show a green check. Degraded servers show latency warnings. Unreachable servers show a critical alert.

**Step 2 — Diagnose a specific server**

```bash
python3 check.py --ping --server filesystem
```

Detailed output for a single server: latency, last seen, tool count, auth status.

**Step 3 — Validate configuration**

```bash
python3 check.py --config
```

Checks that all MCP config entries have the required fields (`command`, `args` or `url` depending on transport type).

**Step 4 — Review history**

```bash
python3 check.py --history
```

Shows uptime trends over the last 20 checks. Spot servers that are intermittently failing.

## State

Per-server health records and check history stored in `~/.openclaw/skill-state/mcp-health-checker/state.yaml`.

Fields: `last_check_at`, `servers` list, `check_history`.

## Notes

- Does not modify MCP configuration — read-only monitoring
- Connection probes use the same transport as the MCP server (stdio subprocess spawn or HTTP GET)
- For stdio servers: probes verify the process can start and respond to `initialize`
- For HTTP/SSE servers: probes send a health-check HTTP request
- Latency threshold configurable via `--timeout` (default: 5s)
- Staleness threshold configurable via `--max-age` (default: 24h)