batch-health
**batch-health** audits whether all enabled scheduled skills executed during their expected 06:00–07:30 UTC morning window, alerting when skills go silent. It runs at 08:00 UTC daily, parsing `aeon.yml` to build a list of skills that should have fired (checking enabled status, cron schedule, and day-of-month match), then cross-references actual execution logs to detect isolated misses (WARN) or multi-skill outages (OUTAGE). Use this to catch batch-level failures and silent scheduling regressions that other monitoring tools miss.
git clone --depth 1 https://github.com/aaronjmars/aeon /tmp/batch-health && cp -r /tmp/batch-health/skills/batch-health ~/.claude/skills/batch-healthSKILL.md
Today is ${today}. Cross-reference what was scheduled to run this morning against what actually ran.
## Goal
`frequency-guard` catches skills running *too often*. Nothing catches when they go *silent* (example: a historical batch outage where multiple even-day skills missed the morning window and no alert fired). This skill closes that gap.
Runs at 08:00 UTC daily — after the 06:00–07:30 UTC batch window completes. Skills scheduled at 07:31+ are excluded from the expected list (transient cron drift could leave them in flight when we audit). Output: OK if everything ran, WARN for 1-2 isolated misses, OUTAGE for 3+ missing skills.
## Steps
### 1. Build the expected-run list for today
Read `aeon.yml`. For each skill with `enabled: true`, check its `schedule` field against today's context.
**Parse today's date** `${today}` (YYYY-MM-DD):
- Extract day-of-month (DOM).
- Odd DOM = 1, 3, 5, 7, 9, 11, 13, 15, 17, 19, 21, 23, 25, 27, 29, 31
- Even DOM = 2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30
**Include** a skill in the expected list if ALL of:
1. `enabled: true`
2. Schedule fires in the 06:00–07:30 UTC audit window. Parse the minute and hour fields:
- hour `6` (any minute 0–59) → in window
- hour `7` AND minute ≤ 30 → in window
- hour `7` AND minute > 30 → **exclude** (too close to the 08:00 audit; transient cron drift could leave it in flight when we check)
- multi-hour like `7,19` → counts as hour 7 for the AM window (apply the same minute rule)
3. Day-of-month matches today:
- `* * *` or any-day pattern → always included
- `1/2 * *` in DOM field → odd days only
- `2/2 * *` in DOM field → even days only
- `*/N * *` step in DOM field → include if `(DOM - 1) % N == 0`. Vixie-cron expands `*/N` to `1-31/N` → DOMs 1, 1+N, 1+2N, … For `*/3`: DOMs 1, 4, 7, 10, 13, 16, 19, 22, 25, 28, 31. **Do not** blanket-include — that false-WARNs ~2/3 of days.
- Day-of-week patterns (`* * N`) → include only if today's weekday matches
**Exclude**: disabled skills, skills firing outside the 06:00–07:30 UTC window (e.g., `schedule: "30 9 * * *"` at 09:30, `schedule: "40 7 * * *"` at 07:40 — both out of scope for this audit).
### 2. Fetch actual runs
Try the skill-runs script:
```bash
./scripts/skill-runs --json --hours 26
```
From the JSON output, extract the `.skills[]` array — each entry has a `skill` name and `last_conclusion`. A skill that appears here (any conclusion) means the GHA cron fired.
**Fallback** if `skill-runs` fails (sandbox network block or `gh api` error):
Parse `memory/logs/${today}.md`:
```bash
cat memory/logs/${today}.md 2>/dev/null
```
Look for `## SkillName` headers or `*_OK` / `*_SKIP` / `*_FAIL` markers. Any skill that logged means it ran. Note: fallback gives partial coverage only — missing from the log doesn't mean it didn't run.
### 3. Cross-reference
For each skill in the expected list:
- **OK**: appears in actual-run list (any conclusion) — scheduler fired correctly
- **FAILED**: in actual-run list with `failure` conclusion — scheduler fired, skill errored (tracked by skill-health, not our concern here)
- **MISSING**: not in actual-run list at all — GHA cron never triggered it
Tally: N_expected, N_ok, N_failed, N_missing.
### 4. Classify severity
| Missing count | Severity |
|---------------|----------|
| 0 | OK — no notification needed |
| 1–2 | WARN — isolated hiccup, possible transient cron delay |
| 3+ | OUTAGE — batch-level failure |
### 5. File issue on OUTAGE
If 3+ missing, check whether today's outage is already tracked:
1. Read `memory/issues/INDEX.md` — scan open issues for one matching today's date and "batch" keyword. If found, skip filing.
2. Determine next ISS number: scan existing `memory/issues/ISS-*.md` files, take highest N + 1.
3. Create `memory/issues/ISS-{NNN}.md`:
```markdown
---
id: ISS-{NNN}
title: Batch outage — {N} skills missed morning window {today}
status: open
severity: medium
category: unknown
detected_by: batch-health
detected_at: {today}T08:00:00Z
resolved_at: null
affected_skills: [{comma-separated skill names}]
root_cause: null
fix_pr: null
---
# ISS-{NNN}: Batch Outage {today}
{N} enabled skills never triggered in the 06:00–07:30 UTC window on {today}.
## Missing Skills
{list each missing skill and its aeon.yml schedule}
## Context
Detected by `batch-health` at 08:00 UTC. The GHA scheduler fired for daily-promoted skills but the batch window was silent.
## Next Steps
- Check GitHub Actions for {today} around 06:00–07:30 UTC for cron delivery failures
- If one-off: likely GHA infrastructure delay (documented pattern)
- If recurring: investigate aeon.yml schedule changes or GHA Actions billing limits
```
4. Append to `memory/issues/INDEX.md` open table:
```
| ISS-{NNN} | medium | Batch outage — {N} skills missed morning window {today} | {today} | batch-health |
```
5. Update `memory/MEMORY.md` issue tracker summary if the issue changes the open count.
### 6. Notify (WARN or OUTAGE only)
Write to `.pending-notify-temp/batch-health-${today}.md` (create dir if needed), then send.
**OUTAGE format:**
```
batch outage — ${today}
${N} skills never triggered (06:00–07:30 UTC):
${list each missing skill with its schedule}
ISS-${NNN} filed. check GHA cron status.
```
**WARN format:**
```
batch gap — ${today}
${N} scheduled skills didn't run:
${list each}
isolated miss — transient cron delay likely. monitoring.
```
Send: `./notify -f .pending-notify-temp/batch-health-${today}.md`
### 7. Log
Append to `memory/logs/${today}.md`:
```markdown
## Batch Health
- **Expected:** ${N} skills in 06:00–07:30 UTC window
- **Expected list:** ${comma-separated skill names}
- **OK:** ${N_ok}
- **Failed:** ${N_failed} (${list or "none"})
- **Missing:** ${N_missing} (${list or "none"})
- **Status:** OK / WARN / OUTAGE
- **Issue filed:** ISS-${NNN} / n/a
- **Notification:** sent / skipped (OK)
- BATCH_HEALTH_OK
```
## Sandbox Note
`./scripts/skill-runs --json` calls `gh api` which is authenticatedMention/keyword sweep on social platforms for [REPLACE: KEYWORDS] — trends, sentiment, top posts
5 concrete real-life actions, leverage-scored against open loops with specificity and anti-fluff gates
Curated AI-agent tweets, clustered into narratives with insight summaries
Tracker of AI agent substitution signals — which roles, companies, and industries show real headcount displacement. Named roles + real deployments only.
Competitive-intelligence digest on the AI agent framework space — momentum, releases, breaking changes across a curated watchlist
Cross-domain market pulse from AIXBT's free grounding endpoint — crypto, macro, tradfi, geopolitics. Refreshes taxonomy references (clusters, chains) as a bonus.
Pre-batch API provider health check — detects credit exhaustion or auth failure for every configured provider key before the scheduled batch runs, giving the operator a window to act before skills degrade
List a wallet's live ERC-20 token approvals on Base and flag unlimited / risky spender grants. Keyless via Base RPC (eth_getLogs + eth_call) — no explorer key needed.