Skip to main content
ClaudeWave
Skill575 estrellas del repoactualizado 10d ago

paper-orchestra

paper-orchestra orchestrates a five-agent pipeline that transforms unstructured research materials (idea summary, experimental log, LaTeX template, conference guidelines, and optional figures) into a submission-ready manuscript and PDF. Use this skill when users request end-to-end paper generation from experimental data, asking to convert results into conference submissions or activate the full automated writing workflow across outline generation, figure creation, literature review, section drafting, and refinement stages.

Instalar en Claude Code
Copiar
git clone --depth 1 https://github.com/Ar9av/PaperOrchestra /tmp/paper-orchestra && cp -r /tmp/paper-orchestra/skills/paper-orchestra ~/.claude/skills/paper-orchestra
Después abre una sesión nueva de Claude Code; el skill carga automáticamente.

SKILL.md

# paper-orchestra (Orchestrator)

Top-level driver for the PaperOrchestra pipeline. Read this document and follow
the steps below. The detailed prompts and rules live in each sub-skill's
`SKILL.md` and `references/` directories — you (the host agent) will load them
as you go.

> Source paper: Song et al., *PaperOrchestra: A Multi-Agent Framework for
> Automated AI Research Paper Writing*, arXiv:2604.05018, 2026.
> <https://arxiv.org/pdf/2604.05018>

## What this skill produces

A complete submission package `P = (paper.tex, paper.pdf)` written into
`workspace/final/`, plus a full audit trail under `workspace/` (outline,
figures, refs, drafts, refinement worklog, provenance snapshot).

## Inputs (the (I, E, T, G, F) tuple from the paper)

The workspace MUST contain:

| File | Symbol | Required | Description |
|---|---|---|---|
| `workspace/inputs/idea.md` | `I` | yes | Idea Summary (Sparse or Dense variant — see `references/io-contract.md`) |
| `workspace/inputs/experimental_log.md` | `E` | yes | Experimental Log: setup, raw numeric data, qualitative observations |
| `workspace/inputs/template.tex` | `T` | yes | LaTeX template for the target conference (with `\section{...}` commands) |
| `workspace/inputs/conference_guidelines.md` | `G` | yes | Formatting rules, page limit, mandatory sections |
| `workspace/inputs/figures/` | `F` | no | Optional pre-existing figures. If empty, the plotting agent generates everything. |

`scripts/init_workspace.py` will scaffold this layout. `scripts/validate_inputs.py`
will check it before the pipeline runs.

## Pipeline (read `references/pipeline.md` for the full diagram)

```
Step 1: Outline           ──▶  outline.json                       (1 LLM call)
Step 2: Plotting     ─┐
                      ├──▶  figures/*.png + captions.json         (~20-30 calls)
Step 3: Lit Review   ─┘                                           (~20-30 calls)
                          intro_relwork.tex + refs.bib

Step 4: Section Writing  ──▶  drafts/paper.tex                    (1 LLM call)
Step 5: Content Refine   ──▶  final/paper.tex + final/paper.pdf   (~5-7 calls, ~3 iters)
```

Step 2 and Step 3 are independent and **MUST run in parallel** when your host
supports parallel sub-agents. If not, run Step 3 first (it has the longer wall
time due to Semantic Scholar rate limits) and Step 2 second.

## Critical pre-instruction (read once, apply always)

Before any LLM call that *writes* paper content (outline, intro/related work,
section writing, refinement), you MUST prepend the **Anti-Leakage Prompt** at
`references/anti-leakage-prompt.md` to your system prompt. This is verbatim
from Appendix D.4 of the paper and prevents pre-training-data leakage. The
paper applies it uniformly across all baselines for fair comparison; we apply
it for fidelity *and* to keep generated papers grounded in the user's inputs.

## Step-by-step execution

### 0. Pre-flight Checks

Before running the pipeline, perform the following quality gates in order:

```bash
# 1. Scaffold the workspace
python skills/paper-orchestra/scripts/init_workspace.py --out workspace/
# user drops their inputs into workspace/inputs/

# 2. Validate required files are present and well-formed
python skills/paper-orchestra/scripts/validate_inputs.py --workspace workspace/

# 3. Check input density — idea and experimental log must meet minimum thresholds
python skills/paper-orchestra/scripts/check_idea_density.py \
    --idea workspace/inputs/idea.md \
    --log workspace/inputs/experimental_log.md

# 4. Cross-validate consistency between idea and experimental log
python skills/paper-orchestra/scripts/validate_consistency.py \
    --idea workspace/inputs/idea.md \
    --log workspace/inputs/experimental_log.md
```

If `validate_inputs.py` or `check_idea_density.py` fail (exit code 1 or 2), stop
and tell the user what's missing or below threshold — do not proceed until fixed.

`validate_consistency.py` produces warnings only (exit code 1 = WARN, non-blocking);
report warnings to the user but continue.

**Before failing on missing inputs**, check whether aggregation can supply them:

| Inputs state | Action |
|---|---|
| `idea.md` and `experimental_log.md` both present and non-empty | Continue to Step 1. |
| Either is missing/empty, and the user mentioned a directory | Load and run `agent-research-aggregator` with that directory as `--search-roots`, then re-validate. |
| Either is missing/empty, no directory mentioned | Ask the user: "Your workspace is missing `idea.md` / `experimental_log.md`. Do you have a folder with research notes or agent history I can aggregate from? If so, tell me the path — or drop the files manually into `workspace/inputs/`." |

If validation still fails after aggregation (e.g. `template.tex` or `conference_guidelines.md` are missing), stop and tell the user exactly which files remain outstanding.

**Also probe the TeX installation** (once per workspace, result cached):

```bash
python skills/paper-orchestra/scripts/check_tex_packages.py \
    --out workspace/tex_profile.json
```

The Section Writing Agent reads `tex_profile.json` to decide which LaTeX
patterns to use (e.g., `Figure~\ref{}` vs `\cref{}`, whether to include
`\usepackage{microtype}`, etc.). This eliminates compile-time package
failures that previously required iterative manual edits.

### 1. Outline (Step 1 — 1 LLM call)

Load `skills/outline-agent/SKILL.md` and follow it. Output: `workspace/outline.json`.
Validate with `python skills/outline-agent/scripts/validate_outline.py workspace/outline.json`.
**Halt the pipeline if validation fails** — every downstream agent depends on the schema.

### 2 ∥ 3. Plotting and Literature Review (in parallel)

Parse `outline.json`. Extract:
- `outline.plotting_plan` → drives Step 2
- `outline.intro_related_work_plan` → drives Step 3

If your host supports parallel sub-agents (Claude Code's Agent tool with multiple
concurrent calls; Cursor's parallel agents; Antigravity's worker pool), spawn
**two conc
agent-research-aggregatorSkill

Pre-pipeline aggregator that scans AI agent cache directories (.claude, .cursor, .antigravity, .openclaw) or any user-specified directory for experimentation logs, extracts insights and numeric results, and formats them as PaperOrchestra-ready inputs (idea.md + experimental_log.md). TRIGGER when the user says "aggregate my agent logs for paper writing", "extract experiments from my coding agent history", "prepare PaperOrchestra inputs from my cache", "turn my agent logs into a paper", mentions a folder or directory they want to use as the basis for a paper, or wants to run PaperOrchestra but only has scattered agent experiment histories rather than structured inputs. Run this BEFORE paper-orchestra. Also called automatically by paper-orchestra when workspace/inputs/idea.md or workspace/inputs/experimental_log.md are missing.

content-refinement-agentSkill

Step 5 of the PaperOrchestra pipeline (arXiv:2604.05018). Iteratively refine drafts/paper.tex by simulating peer review and applying targeted revisions, with strict accept/revert halt rules. Maintains a worklog and snapshots each iteration so revert is real, not symbolic. TRIGGER when the orchestrator delegates Step 5 or when the user asks to "refine the draft", "iterate on the paper", or "run peer review on this paper".

literature-review-agentSkill

Step 3 of the PaperOrchestra pipeline (arXiv:2604.05018). Execute the literature search strategy from outline.json — discover candidate papers via web search, verify them through Semantic Scholar (Levenshtein > 70 fuzzy title match, temporal cutoff, dedup by paperId), cross-corroborate against Crossref + OpenAlex to flag hallucinated citations, build a BibTeX file, and draft Introduction + Related Work using ≥90% of the verified pool. Runs in parallel with the plotting-agent. TRIGGER when the orchestrator delegates Step 3 or when the user asks to "find citations for my paper", "draft the related work", or "build the bibliography".

outline-agentSkill

Step 1 of the PaperOrchestra pipeline (arXiv:2604.05018). Convert (idea.md, experimental_log.md, template.tex, conference_guidelines.md) into a strict JSON outline containing a plotting plan, literature search plan (Intro + Related Work), and section-level writing plan with citation hints. TRIGGER when the orchestrator delegates Step 1 or when the user asks to "outline a paper from raw materials" or "generate the paper structure".

paper-autoratersSkill

Run the four paper-quality autoraters from PaperOrchestra (arXiv:2604.05018, App. F.3) — Citation F1 (P0/P1 partition + Precision/Recall/F1), Literature Review Quality (6-axis 0-100 with anti-inflation rules), SxS Overall Paper Quality (side-by-side), and SxS Literature Review Quality (side-by-side). TRIGGER when the user asks to "score this paper draft", "evaluate against the benchmark", "compare two papers", or "run the autoraters".

paper-writing-benchSkill

Reverse-engineer raw materials (Sparse idea, Dense idea, experimental log) from an existing AI research paper to build a benchmark case for evaluating paper-writing pipelines. Replicates the PaperWritingBench dataset construction procedure from arXiv:2604.05018 §3 / App. C. TRIGGER when the user asks to "build a benchmark case from this paper", "reverse-engineer raw materials", or "evaluate my pipeline against PaperWritingBench".

plotting-agentSkill

Step 2 of the PaperOrchestra pipeline (arXiv:2604.05018). Execute the visualization plan from outline.json — render plots and conceptual diagrams from experimental_log.md and idea.md, optionally refine via VLM critique loop, and produce context-aware captions. Runs in parallel with the literature-review-agent. TRIGGER when the orchestrator delegates Step 2 or when the user asks to "generate the figures for my paper" or "render the plots from this experiment log".

section-writing-agentSkill

Step 4 of the PaperOrchestra pipeline (arXiv:2604.05018). ONE single multimodal LLM call that drafts the remaining paper sections (Abstract, Methodology, Experiments, Conclusion), extracts numeric values from experimental_log.md into LaTeX booktabs tables, splices the generated figures from Step 2, and merges everything into the template that already contains Intro + Related Work from Step 3. TRIGGER when the orchestrator delegates Step 4 or when the user asks to "write the methodology and experiments sections" or "fill in the rest of the paper".