literature-review-agent
The literature-review-agent executes Step 3 of the PaperOrchestra pipeline, discovering candidate papers through parallel web searches, verifying them against Semantic Scholar using fuzzy title matching (Levenshtein > 70) and temporal cutoffs, cross-checking against Crossref and OpenAlex to detect hallucinated citations, and generating a BibTeX file plus Introduction and Related Work sections. Use this when compiling citations for a research paper or when the orchestrator delegates the citation verification phase.
git clone --depth 1 https://github.com/Ar9av/PaperOrchestra /tmp/literature-review-agent && cp -r /tmp/literature-review-agent/skills/literature-review-agent ~/.claude/skills/literature-review-agentSKILL.md
# Literature Review Agent (Step 3)
Faithful implementation of the Hybrid Literature Agent from PaperOrchestra
(Song et al., 2026, arXiv:2604.05018, §4 Step 3, App. D.3, App. F.1 p.46).
**Cost: ~20–30 LLM calls.** This is one of the two longest steps (the other is
plotting). Wall-time floor is set by Semantic Scholar's 1 QPS verification
limit.
## Inputs
- `workspace/outline.json` — specifically `intro_related_work_plan` with the
Introduction search directions and the 2-4 Related Work methodology
clusters
- `workspace/inputs/conference_guidelines.md` — used to derive `cutoff_date`
- `workspace/inputs/idea.md`, `workspace/inputs/experimental_log.md` — for
framing the Intro and grounding the Related Work positioning
## Outputs
- `workspace/citation_pool.json` — verified Semantic Scholar metadata for
every paper that survived verification
- `workspace/refs.bib` — BibTeX file generated from the verified pool
- `workspace/drafts/intro_relwork.tex` — drafted Introduction and Related
Work sections, written into the template, with the rest of the template
preserved verbatim
## Two-phase pipeline (App. D.3)
```
PHASE 1 — Parallel Candidate Discovery
For each search direction in introduction_strategy.search_directions:
For each limitation_search_query in each related_work cluster:
- Use the host's web search tool to discover up to ~10 candidate papers.
- Run up to 10 discovery queries in parallel (host-permitting).
- Collect (title, snippet, url) tuples — no verification yet.
→ PRE-DEDUP before Phase 2 (see Step 1.5 below)
PHASE 2 — Sequential Citation Verification (1 QPS, with cache)
For each candidate (after pre-dedup), sequentially:
0. Check s2_cache.json first (scripts/s2_cache.py --check).
If HIT: use cached response, skip live S2 call. No throttle needed.
If MISS: proceed with live request below.
1. Query Semantic Scholar by title:
GET https://api.semanticscholar.org/graph/v1/paper/search?query=<title>
&fields=title,abstract,year,authors,venue,externalIds&limit=5
(Public endpoint, no key. Throttle to 1 QPS for live requests only.)
2. Store the S2 response in cache: s2_cache.py --store.
3. Pick the top hit. Check Levenshtein title ratio against the original
candidate title. If ratio < 70: discard.
4. Bonus: if year and venue exactly align with hints, add a +5 point
match-quality bonus.
5. Require: abstract is non-empty.
6. Require: paper.year (or month if known) strictly predates cutoff_date.
Months default to day-1: e.g., "October 2024" → 2024-10-01.
7. If all checks pass, add to verified pool.
After all candidates are verified, dedup by Semantic Scholar paperId.
```
The host agent does the LLM/web work; the deterministic helpers in `scripts/`
do the math.
## Step-by-step
### 0. Derive `cutoff_date`
Parse `conference_guidelines.md` for the submission deadline. The paper aligns
research cutoff with venue submission deadline (App. D.1):
| Venue | Cutoff |
|---|---|
| CVPR 2025 | Nov 2024 |
| ICLR 2025 | Oct 2024 |
| Other | One month before the stated submission deadline |
Encode as `YYYY-MM-DD`. Months default to day-1 (e.g., `2024-10-01`).
### 1. Phase 1: Parallel Candidate Discovery
From `outline.json`:
- All `introduction_strategy.search_directions` (3-5 queries)
- For each cluster in `related_work_strategy.subsections`:
- The cluster's `sota_investigation_mission` becomes a search query
- All `limitation_search_queries` (1-3 each)
For each query, **use your host's web search tool** (e.g., `WebSearch` in
Claude Code, `@web` in Cursor, the search tool in Antigravity). Collect the
top ~10 candidates per query: title, abstract snippet, source URL.
If your host supports parallel sub-tasks, fire up to 10 concurrent search
queries. If not, run sequentially — slower but functionally equivalent.
#### Optional: Exa as a Phase 1 backend
If your host has no native web search, OR you want a research-paper-focused
backend with better signal-to-noise, you can use [Exa](https://exa.ai) via
the bundled `scripts/exa_search.py` helper. It is **opt-in** and reads
`EXA_API_KEY` from the environment — the repo never commits a key.
```bash
export EXA_API_KEY="your-key-here" # get one at https://dashboard.exa.ai/
python skills/literature-review-agent/scripts/exa_search.py \
--query "Sparse attention long context transformers" \
--num-results 15 \
--discovered-for "related_work[2.1]"
```
Output is a normalized candidate list ready to merge into
`raw_candidates.json`. Phase 2 verification (Semantic Scholar fuzzy match,
cutoff, dedup) is unchanged. See `references/exa-search-cookbook.md` for
the full recipe, query patterns, cost estimates, and security notes.
Combine all discovered candidates into a single working list. Tag each with
the originating query ID so you can later attribute it to "intro" vs
"related_work[i]".
### 1.5. Pre-dedup before Phase 2
**Always run this before starting Phase 2.** Multiple search queries routinely
return the same papers (e.g., "Attention is All You Need" appears in almost
every NLP discovery query). Verifying duplicates wastes 30-40% of S2 quota
at 1 QPS.
```bash
python skills/literature-review-agent/scripts/pre_dedup_candidates.py \
--in workspace/raw_candidates.json \
--out workspace/deduped_candidates.json
# Prints: "150 candidates → 97 unique (53 duplicates removed)"
```
Use `workspace/deduped_candidates.json` as input to Phase 2.
### 2. Phase 2: Sequential Verification via Semantic Scholar (with cache)
For each candidate in `deduped_candidates.json`, in **sequential** order:
**Step A — check cache first** (no S2 call, no throttle needed):
```bash
python skills/literature-review-agent/scripts/s2_cache.py \
--cache workspace/cache/s2_cache.json \
--check "<candidate title>"
# exit 0 + prints JSON → use cached response, skip Step B
# exit 1 → proceed to Step B
```Pre-pipeline aggregator that scans AI agent cache directories (.claude, .cursor, .antigravity, .openclaw) or any user-specified directory for experimentation logs, extracts insights and numeric results, and formats them as PaperOrchestra-ready inputs (idea.md + experimental_log.md). TRIGGER when the user says "aggregate my agent logs for paper writing", "extract experiments from my coding agent history", "prepare PaperOrchestra inputs from my cache", "turn my agent logs into a paper", mentions a folder or directory they want to use as the basis for a paper, or wants to run PaperOrchestra but only has scattered agent experiment histories rather than structured inputs. Run this BEFORE paper-orchestra. Also called automatically by paper-orchestra when workspace/inputs/idea.md or workspace/inputs/experimental_log.md are missing.
Step 5 of the PaperOrchestra pipeline (arXiv:2604.05018). Iteratively refine drafts/paper.tex by simulating peer review and applying targeted revisions, with strict accept/revert halt rules. Maintains a worklog and snapshots each iteration so revert is real, not symbolic. TRIGGER when the orchestrator delegates Step 5 or when the user asks to "refine the draft", "iterate on the paper", or "run peer review on this paper".
Step 1 of the PaperOrchestra pipeline (arXiv:2604.05018). Convert (idea.md, experimental_log.md, template.tex, conference_guidelines.md) into a strict JSON outline containing a plotting plan, literature search plan (Intro + Related Work), and section-level writing plan with citation hints. TRIGGER when the orchestrator delegates Step 1 or when the user asks to "outline a paper from raw materials" or "generate the paper structure".
Run the four paper-quality autoraters from PaperOrchestra (arXiv:2604.05018, App. F.3) — Citation F1 (P0/P1 partition + Precision/Recall/F1), Literature Review Quality (6-axis 0-100 with anti-inflation rules), SxS Overall Paper Quality (side-by-side), and SxS Literature Review Quality (side-by-side). TRIGGER when the user asks to "score this paper draft", "evaluate against the benchmark", "compare two papers", or "run the autoraters".
Orchestrate the full PaperOrchestra (Song et al., 2026, arXiv:2604.05018) five-agent pipeline to turn unstructured research materials (idea, experimental log, LaTeX template, conference guidelines, optional figures) into a submission-ready LaTeX manuscript and compiled PDF. TRIGGER when the user asks to "write a paper from my experiments", "turn this idea and these results into a paper", "generate a conference submission", "run paper-orchestra on X", or otherwise wants the end-to-end paper-writing pipeline. Coordinates the outline-agent, plotting-agent, literature-review-agent, section-writing-agent, and content-refinement-agent skills.
Reverse-engineer raw materials (Sparse idea, Dense idea, experimental log) from an existing AI research paper to build a benchmark case for evaluating paper-writing pipelines. Replicates the PaperWritingBench dataset construction procedure from arXiv:2604.05018 §3 / App. C. TRIGGER when the user asks to "build a benchmark case from this paper", "reverse-engineer raw materials", or "evaluate my pipeline against PaperWritingBench".
Step 2 of the PaperOrchestra pipeline (arXiv:2604.05018). Execute the visualization plan from outline.json — render plots and conceptual diagrams from experimental_log.md and idea.md, optionally refine via VLM critique loop, and produce context-aware captions. Runs in parallel with the literature-review-agent. TRIGGER when the orchestrator delegates Step 2 or when the user asks to "generate the figures for my paper" or "render the plots from this experiment log".
Step 4 of the PaperOrchestra pipeline (arXiv:2604.05018). ONE single multimodal LLM call that drafts the remaining paper sections (Abstract, Methodology, Experiments, Conclusion), extracts numeric values from experimental_log.md into LaTeX booktabs tables, splices the generated figures from Step 2, and merges everything into the template that already contains Intro + Related Work from Step 3. TRIGGER when the orchestrator delegates Step 4 or when the user asks to "write the methodology and experiments sections" or "fill in the rest of the paper".