llm-council
llm-council executes a three-phase multi-model deliberation pattern where independent LLMs generate responses, rank anonymized peer outputs, and a chairman synthesizes findings. Use it for high-stakes decisions requiring multiple perspectives, architecture reviews, or conflicting rule resolution. Supports provider-agnostic configuration (Anthropic, OpenAI, Fireworks, OpenRouter, custom endpoints) and optionally persists deliberation transcripts to wiki pages for documentation and searchability.
git clone --depth 1 https://github.com/rohitg00/pro-workflow /tmp/llm-council && cp -r /tmp/llm-council/skills/llm-council ~/.claude/skills/llm-councilSKILL.md
# LLM Council Karpathy's LLM Council pattern, provider-agnostic. dair-academy's version hardcoded Fireworks; ours reads any OpenAI-compatible endpoint via env. ## When to use - High-stakes plan review (`/plan` crosses N-file threshold) - Conflicting learning-rules → re-resolve via vote - User invokes `/council "<query>"` or `/wiki council` - Architecture decisions where you want multiple viewpoints captured - Persisting deliberation as a wiki page for future reference ## Three phases 1. **Independent**: each model answers in parallel 2. **Ranking**: each model ranks anonymized peer responses 3. **Synthesis**: chairman model reads all responses + rankings → final answer ## Provider config Provider chosen via env. First-match wins: | Env var | Provider | Default base URL | |---------|----------|------------------| | `ANTHROPIC_API_KEY` | Anthropic | `https://api.anthropic.com` | | `OPENAI_API_KEY` | OpenAI | `https://api.openai.com/v1` | | `OPENROUTER_API_KEY` | OpenRouter | `https://openrouter.ai/api/v1` | | `FIREWORKS_API_KEY` | Fireworks | `https://api.fireworks.ai/inference/v1` | | `LLM_COUNCIL_BASE_URL` + `LLM_COUNCIL_API_KEY` | Custom OpenAI-compat | (user-supplied) | Override per-run with `--provider openai|anthropic|openrouter|fireworks|custom`. Default model rosters per provider live in `scripts/council.js` and can be overridden via `--models` CSV and `--chairman <id>`. ## Commands ``` node $SKILL_ROOT/scripts/council.js run "<query>" [--models id1,id2,id3] [--chairman id] [--provider <name>] [--wiki <slug>] node $SKILL_ROOT/scripts/council.js providers node $SKILL_ROOT/scripts/council.js show <session-id> ``` `--wiki <slug>` writes the full transcript to `<wiki>/derived/council/<session-id>.md` and registers it via `wiki-cli.js page` so it shows in FTS5 search. ## Output Each session writes: ``` ~/.pro-workflow/council/<session-id>/ ├── config.json # query, models, chairman, provider ├── phase1_responses.json # raw API responses per model ├── phase2_rankings.json # anonymized ranking outputs ├── phase3_synthesis.txt # chairman's final answer └── final_output.md # human-readable bundle ``` Console prints the markdown bundle. Pipe to `pbcopy` / `tee` as needed. ## Hard rules 1. Never skip the ranking phase. It's the core of the council pattern. 2. Save raw responses to disk verbatim. No summarization in storage. 3. Anonymize responses for ranking — models see `Response A/B/C/...`, not peer names. 4. The chairman sees both real names AND rankings. 5. Display all three phases to the user. No phase elision. ## Cost awareness The script logs per-call latency + tokens on supported providers. Multiply by your provider rate to estimate. Council cost grows linearly with `len(models)^2` (each model ranks all others) plus the chairman. Default council size: 3-5 models. More models = exponentially more ranking calls. ## Use with wiki ``` /wiki council agent-memory "should we adopt episodic memory in our agents?" ``` Loads `agent-memory` wiki context as system prompt prefix, runs council, persists transcript as `wiki/derived/council/<id>.md`. The transcript becomes searchable via `/wiki ask`.
Analyzes and optimizes context window usage across sessions. Use when context feels bloated, sessions run slow, or approaching compaction limits.
Analyze session token usage and cost patterns. Identify expensive operations and recommend optimizations. Use to understand and reduce session costs.
Specialized debugging agent. Use when facing hard bugs, test failures, or runtime errors that need systematic investigation.
Multi-phase development agent. Research > Plan > Implement with validation gates. Use PROACTIVELY when building features that touch >5 files or require architecture decisions.
Analyze permission denial patterns and generate optimized alwaysAllow/alwaysDeny rules. Use when permission prompts slow down workflow.
Break down complex tasks into implementation plans before writing code. Use when task touches >5 files, requires architecture decisions, or has unclear requirements.
Code review specialist that verifies every finding against actual code before reporting. Use before committing, for PR reviews, or after major changes.
Confidence-gated exploration that assesses readiness before implementation. Scores 0-100 across five dimensions and gives GO/HOLD verdict.