Skill235 repo starsupdated 3d ago

task-orchestrator

Autonomous multi-agent task orchestration with dependency analysis, parallel tmux/Codex execution, and self-healing heartbeat monitoring. Use for large projects with multiple issues/tasks that need coordinated parallel execution.

View source Repository: agent-skills

Install in Claude Code

Copy

git clone --depth 1 https://github.com/jdrhyne/agent-skills /tmp/task-orchestrator && cp -r /tmp/task-orchestrator/skills/task-orchestrator ~/.claude/skills/task-orchestrator

Then start a new Claude Code session; the skill loads automatically.

Definition

SKILL.md

# Task Orchestrator

Autonomous orchestration of multi-agent builds using tmux + Codex with self-healing monitoring.

**Load the senior-engineering skill alongside this one for engineering principles.**

## Safety Boundaries

- Do not launch parallel workers for tasks with overlapping write scope until the dependency is resolved.
- Do not push branches, merge work, or self-heal by guessing when human review is required.
- Do not store secrets in manifests, logs, prompts, or tmux pane captures.
- Do not continue retrying a failing task indefinitely; stop and surface the blocker after bounded retries.

## Core Concepts

### 1. Task Manifest
A JSON file defining all tasks, their dependencies, files touched, and status.

```json
{
  "project": "project-name",
  "repo": "owner/repo",
  "workdir": "/path/to/worktrees",
  "created": "2026-01-17T00:00:00Z",
  "model": "gpt-5.2-codex",
  "modelTier": "high",
  "phases": [
    {
      "name": "Phase 1: Critical",
      "tasks": [
        {
          "id": "t1",
          "issue": 1,
          "title": "Fix X",
          "files": ["src/foo.js"],
          "dependsOn": [],
          "status": "pending",
          "worktree": null,
          "tmuxSession": null,
          "startedAt": null,
          "lastProgress": null,
          "completedAt": null,
          "prNumber": null
        }
      ]
    }
  ]
}
```

### 2. Dependency Rules
- **Same file = sequential** — Tasks touching the same file must run in order or merge
- **Different files = parallel** — Independent tasks can run simultaneously
- **Explicit depends = wait** — `dependsOn` array enforces ordering
- **Phase gates** — Next phase waits for current phase completion

### 3. Execution Model
- Each task gets its own **git worktree** (isolated branch)
- Each task runs in its own **tmux session**
- Use **Codex with --yolo** for autonomous execution
- Model: **GPT-5.2-codex high** (configurable)

---

## Setup Commands

### Initialize Orchestration

```bash
# 1. Create working directory
WORKDIR="${TMPDIR:-/tmp}/orchestrator-$(date +%s)"
mkdir -p "$WORKDIR"

# 2. Clone repo for worktrees
git clone https://github.com/OWNER/REPO.git "$WORKDIR/repo"
cd "$WORKDIR/repo"

# 3. Create tmux socket
SOCKET="$WORKDIR/orchestrator.sock"

# 4. Initialize manifest
cat > "$WORKDIR/manifest.json" << 'EOF'
{
  "project": "PROJECT_NAME",
  "repo": "OWNER/REPO",
  "workdir": "WORKDIR_PATH",
  "socket": "SOCKET_PATH",
  "created": "TIMESTAMP",
  "model": "gpt-5.2-codex",
  "modelTier": "high",
  "phases": []
}
EOF
```

### Analyze GitHub Issues for Dependencies

```bash
# Fetch all open issues
gh issue list --repo OWNER/REPO --state open --json number,title,body,labels > issues.json

# Group by files mentioned in issue body
# Tasks touching same files should serialize
```

### Create Worktrees

```bash
# For each task, create isolated worktree
cd "$WORKDIR/repo"
git worktree add -b fix/issue-N "$WORKDIR/task-tN" main
```

### Launch Tmux Sessions

```bash
SOCKET="$WORKDIR/orchestrator.sock"

# Create session for task
tmux -S "$SOCKET" new-session -d -s "task-tN"

# Launch Codex (uses gpt-5.2-codex with reasoning_effort=high from ~/.codex/config.toml)
# Note: Model config is in ~/.codex/config.toml, not CLI flag
tmux -S "$SOCKET" send-keys -t "task-tN" \
  "cd $WORKDIR/task-tN && codex --yolo 'Fix issue #N: DESCRIPTION. Run tests, commit with good message, push to origin.'" Enter
```

---

## Monitoring & Self-Healing

### Progress Check Script

```bash
#!/bin/bash
# check_progress.sh - Run via heartbeat

WORKDIR="$1"
SOCKET="$WORKDIR/orchestrator.sock"
MANIFEST="$WORKDIR/manifest.json"
STALL_THRESHOLD_MINS=20

check_session() {
  local session="$1"
  local task_id="$2"
  
  # Capture recent output
  local output=$(tmux -S "$SOCKET" capture-pane -p -t "$session" -S -50 2>/dev/null)
  
  # Check for completion indicators
  if echo "$output" | grep -qE "(All tests passed|Successfully pushed|❯ $)"; then
    echo "DONE:$task_id"
    return 0
  fi
  
  # Check for errors
  if echo "$output" | grep -qiE "(error:|failed:|FATAL|panic)"; then
    echo "ERROR:$task_id"
    return 1
  fi
  
  # Check for stall (prompt waiting for input)
  if echo "$output" | grep -qE "(\? |Continue\?|y/n|Press any key)"; then
    echo "STUCK:$task_id:waiting_for_input"
    return 2
  fi
  
  echo "RUNNING:$task_id"
  return 0
}

# Check all active sessions
for session in $(tmux -S "$SOCKET" list-sessions -F "#{session_name}" 2>/dev/null); do
  check_session "$session" "$session"
done
```

### Self-Healing Actions

When a task is stuck, the orchestrator should:

1. **Waiting for input** → Send appropriate response
   ```bash
   tmux -S "$SOCKET" send-keys -t "$session" "y" Enter
   ```

2. **Error/failure** → Capture logs, analyze, retry with fixes
   ```bash
   # Capture error context
   tmux -S "$SOCKET" capture-pane -p -t "$session" -S -100 > "$WORKDIR/logs/$task_id-error.log"
   
   # Kill and restart with error context
   tmux -S "$SOCKET" kill-session -t "$session"
   tmux -S "$SOCKET" new-session -d -s "$session"
   tmux -S "$SOCKET" send-keys -t "$session" \
     "cd $WORKDIR/$task_id && codex --model gpt-5.2-codex-high --yolo 'Previous attempt failed with: $(cat error.log | tail -20). Fix the issue and retry.'" Enter
   ```

3. **No progress for 20+ mins** → Nudge or restart
   ```bash
   # Check git log for recent commits
   cd "$WORKDIR/$task_id"
   LAST_COMMIT=$(git log -1 --format="%ar" 2>/dev/null)
   
   # If no commits in threshold, restart
   ```

### Heartbeat Cron Setup

```bash
# Add to cron (every 15 minutes)
cron action:add job:{
  "label": "orchestrator-heartbeat",
  "schedule": "*/15 * * * *",
  "prompt": "Check orchestration progress at WORKDIR. Read manifest, check all tmux sessions, self-heal any stuck tasks, advance to next phase if current is complete. Do NOT ping human - fix issues yourself."
}
```

---

## Workflow: Full Orchestration Run

### Step 1: Analyze & Plan

```bash
# 1. Fetch issues
gh issue list --rep