Skill3.2k repo starsupdated 3mo ago

datadog

The Datadog skill enables investigation of production issues on the Nexu platform by querying the Datadog Logs API. It provides pre-built queries to search for OpenClaw crashes, stderr output, gateway startup events, and Slack token health checks across the nexu-gateway service, requiring users to supply their Datadog API and Application keys for authentication.

View source Repository: nexu

Install in Claude Code

Copy

git clone --depth 1 https://github.com/nexu-io/nexu /tmp/datadog && cp -r /tmp/datadog/skills/localdev/datadog ~/.claude/skills/datadog

Then start a new Claude Code session; the skill loads automatically.

Definition

SKILL.md

# Datadog Log Investigation

Query Datadog Logs API to investigate production issues for the Nexu platform.

## Authentication

**Before making any Datadog API call, you MUST ask the user for these two keys:**

- `DD_API_KEY` — Datadog API Key (Organization Settings → API Keys)
- `DD_APP_KEY` — Datadog Application Key (Organization Settings → Application Keys, requires `logs_read_data` scope)

Store them in shell variables for the session. Never hardcode or commit them.

Site: `datadoghq.com` (US)

## API Base

All requests go to `https://api.datadoghq.com/api/v2/logs/events/search`.

Headers:
```
DD-API-KEY: <api_key>
DD-APPLICATION-KEY: <app_key>
Content-Type: application/json
```

## Common Queries

### OpenClaw Crash Events

```bash
curl -s "https://api.datadoghq.com/api/v2/logs/events/search" \
  -H "DD-API-KEY: $DD_API_KEY" \
  -H "DD-APPLICATION-KEY: $DD_APP_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "filter": {
      "query": "service:nexu-gateway @event:openclaw_crash",
      "from": "now-1h",
      "to": "now"
    },
    "sort": "-timestamp",
    "page": {"limit": 20}
  }'
```

Key fields in results:
- `attributes.attributes.exitCode` — process exit code (1 = fatal error, null = signal)
- `attributes.attributes.signal` — kill signal (SIGKILL, SIGTERM, etc.)
- `attributes.tags` → `pod_name`, `image_tag` — which pod and which version

### OpenClaw stderr Output (Crash Details)

```bash
curl -s "https://api.datadoghq.com/api/v2/logs/events/search" \
  -H "DD-API-KEY: $DD_API_KEY" \
  -H "DD-APPLICATION-KEY: $DD_APP_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "filter": {
      "query": "service:nexu-gateway @stream:stderr",
      "from": "now-1h",
      "to": "now"
    },
    "sort": "-timestamp",
    "page": {"limit": 50}
  }'
```

This shows the actual error output from the OpenClaw process (e.g., `invalid_auth`, `EADDRINUSE`, config validation failures).

### Gateway Startup / Recovery Events

```bash
curl -s "https://api.datadoghq.com/api/v2/logs/events/search" \
  -H "DD-API-KEY: $DD_API_KEY" \
  -H "DD-APPLICATION-KEY: $DD_APP_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "filter": {
      "query": "service:nexu-gateway (\"starting gateway\" OR \"gateway is ready\" OR \"spawned openclaw\")",
      "from": "now-1h",
      "to": "now"
    },
    "sort": "timestamp",
    "page": {"limit": 30}
  }'
```

### Slack Token Health Check

```bash
curl -s "https://api.datadoghq.com/api/v2/logs/events/search" \
  -H "DD-API-KEY: $DD_API_KEY" \
  -H "DD-APPLICATION-KEY: $DD_APP_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "filter": {
      "query": "service:nexu-api slack_token_health*",
      "from": "now-1h",
      "to": "now"
    },
    "sort": "-timestamp",
    "page": {"limit": 20}
  }'
```

### API HTTP Request Logs

```bash
curl -s "https://api.datadoghq.com/api/v2/logs/events/search" \
  -H "DD-API-KEY: $DD_API_KEY" \
  -H "DD-APPLICATION-KEY: $DD_APP_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "filter": {
      "query": "service:nexu-api http_request @attributes.status:>=500",
      "from": "now-1h",
      "to": "now"
    },
    "sort": "-timestamp",
    "page": {"limit": 20}
  }'
```

### Filter by Pod

Add `pod_name:<name>` to the query:

```
service:nexu-gateway pod_name:nexu-gateway-1 @event:openclaw_crash
```

### Filter by Time Window

Use ISO 8601 timestamps:

```json
{
  "from": "2026-03-10T05:00:00Z",
  "to": "2026-03-10T06:00:00Z"
}
```

Or relative: `"now-30m"`, `"now-1h"`, `"now-24h"`.

## Parsing Results

Use python3 inline to extract key fields:

```bash
curl -s ... | python3 -c "
import json, sys
data = json.load(sys.stdin)
events = data.get('data', [])
print(f'Total events: {len(events)}')
for e in events:
    attrs = e['attributes']['attributes']
    tags = e['attributes']['tags']
    pod = next((t.split(':',1)[1] for t in tags if t.startswith('pod_name:')), '?')
    ts = attrs.get('time', '?')
    msg = e['attributes'].get('message', '')[:120]
    print(f'{ts} | pod={pod} | {msg}')
"
```

## Services and Events Reference

| Service | Description |
|---------|-------------|
| `nexu-gateway` | Gateway sidecar (manages OpenClaw process) |
| `nexu-api` | API server |

| Event | Meaning |
|-------|---------|
| `openclaw_crash` | OpenClaw process exited unexpectedly |
| `openclaw_restart_scheduled` | Sidecar scheduling a restart |
| `openclaw_restart_limit` | Max restart attempts exceeded |
| `openclaw_orphan_killed` | Killed zombie OpenClaw process |
| `slack_token_health_check_invalidated` | Invalid Slack tokens detected and marked |

## Tag Reference

| Tag | Example |
|-----|---------|
| `pod_name` | `nexu-gateway-1`, `nexu-gateway-2` |
| `image_tag` | `sha-55f13372bb72abc7db1538cca3db2bcda0d35eba` |
| `kube_stateful_set` | `nexu-gateway` |

## Investigation Playbook

When investigating a crash:

1. **Check crash events** — get exit codes, signals, timestamps, affected pods
2. **Check stderr** — get the actual error message from OpenClaw
3. **Check startup events** — correlate crash with deploy times (`image_tag` changes)
4. **Check token health** — if `invalid_auth`, look for `slack_token_health_check_invalidated`
5. **Check API logs** — if API errors are contributing

## Rules

1. **Never hardcode API keys** in skill files or logs — always use variables
2. **Default time window** — start with `now-1h`, expand to `now-24h` if needed
3. **Always parse and summarize** — don't dump raw JSON to the user
4. **Correlate across services** — crashes often involve both gateway and API logs
5. **Check image_tag** to determine if crashes are related to a specific deployment

More from this repository

process-pr-reviewsSkill

Use when the user asks to process, triage, fetch, view, count, list, or resolve review feedback in a GitHub PR. Supports both CodeRabbit and Codex review workflows. In this workflow, “real review feedback” is strictly defined as actionable inline comments; for CodeRabbit, exclude review summaries and nitpicks, and for Codex, exclude review summary cards and use PR main-thread reactions only as status signals.

clawhubSkill

Use the ClawHub CLI to search, install, update, and publish agent skills from clawhub.com. Use when you need to fetch new skills on the fly, sync installed skills to latest or a specific version, or publish new/updated skill folders with the npm-installed clawhub CLI.

coding-agentSkill

Delegate coding tasks to Codex, Claude Code, or Pi agents via background process. Use when: (1) building/creating new features or apps, (2) reviewing PRs (spawn in temp dir), (3) refactoring large codebases, (4) iterative coding that needs file exploration. NOT for: simple one-liner fixes (just edit), reading code (use read tool), thread-bound ACP harness requests in chat (for example spawn/run Codex or Claude Code in a Discord thread; use sessions_spawn with runtime:"acp"), or any work in ~/clawd workspace (never spawn agents here). Claude Code: use --print --permission-mode bypassPermissions (no PTY). Codex/Pi/OpenCode: pty:true required.

deep-researchSkill

gh-issuesSkill

Fetch GitHub issues, spawn sub-agents to implement fixes and open PRs, then monitor and address PR review comments. Usage: /gh-issues [owner/repo] [--label bug] [--limit 5] [--milestone v1.0] [--assignee @me] [--fork user/repo] [--watch] [--interval 5] [--reviews-only] [--cron] [--dry-run] [--model glm-5] [--notify-channel -1002381931352]

libtv-videoSkill

Seedance 2.0 video & image generation via LibTV Gateway - AI text-to-video, image-to-video, video continuation, style transfer, and text-to-image using Seedance 2.0 model. Also supports Kling 3.0, Wan 2.6, Midjourney, Seedream 5.0. Trigger phrases: seedance, generate video, make a video, generate image, make an image, draw, libtv, liblib.

nano-banana-one-shopSkill

All-in-one image generation with Gemini models. Supports Nano Banana (3.1 Flash), Nano Banana Pro (3 Pro), and Nano Banana 2 (2.5 Flash). Triggers on "generate image", "image generation", "nano banana", "edit image".

qiaomu-mondo-poster-designSkill

一句话生成大师级海报、书籍封面、专辑封面和各类设计作品。无需懂PS、配色或艺术史，AI自动选择最佳风格（基于33+位传奇设计师）。支持多平台多比例：公众号封面(21:9)、小红书配图(3:4)、文章配图(16:9)、书籍封面(9:16)、专辑封面(1:1)、电影海报(9:16)。包含AI提示词优化、风格对比、图生图转换功能。触发词："Mondo风格"、"书籍封面设计"、"专辑封面"、"海报设计"、"读书笔记配图"、"公众号封面"、"小红书配图"、"文章配图"。One-sentence generation of master-level posters, book covers, album covers and designs. 33+ legendary designer styles with multi-platform aspect ratio support (21:9, 16:9, 3:4, 1:1, 9:16).