Skip to main content
ClaudeWave
Skill575 estrellas del repoactualizado 10d ago

literature-review-agent

The literature-review-agent executes Step 3 of the PaperOrchestra pipeline, discovering candidate papers through parallel web searches, verifying them against Semantic Scholar using fuzzy title matching (Levenshtein > 70) and temporal cutoffs, cross-checking against Crossref and OpenAlex to detect hallucinated citations, and generating a BibTeX file plus Introduction and Related Work sections. Use this when compiling citations for a research paper or when the orchestrator delegates the citation verification phase.

Instalar en Claude Code
Copiar
git clone --depth 1 https://github.com/Ar9av/PaperOrchestra /tmp/literature-review-agent && cp -r /tmp/literature-review-agent/skills/literature-review-agent ~/.claude/skills/literature-review-agent
Después abre una sesión nueva de Claude Code; el skill carga automáticamente.

SKILL.md

# Literature Review Agent (Step 3)

Faithful implementation of the Hybrid Literature Agent from PaperOrchestra
(Song et al., 2026, arXiv:2604.05018, §4 Step 3, App. D.3, App. F.1 p.46).

**Cost: ~20–30 LLM calls.** This is one of the two longest steps (the other is
plotting). Wall-time floor is set by Semantic Scholar's 1 QPS verification
limit.

## Inputs

- `workspace/outline.json` — specifically `intro_related_work_plan` with the
  Introduction search directions and the 2-4 Related Work methodology
  clusters
- `workspace/inputs/conference_guidelines.md` — used to derive `cutoff_date`
- `workspace/inputs/idea.md`, `workspace/inputs/experimental_log.md` — for
  framing the Intro and grounding the Related Work positioning

## Outputs

- `workspace/citation_pool.json` — verified Semantic Scholar metadata for
  every paper that survived verification
- `workspace/refs.bib` — BibTeX file generated from the verified pool
- `workspace/drafts/intro_relwork.tex` — drafted Introduction and Related
  Work sections, written into the template, with the rest of the template
  preserved verbatim

## Two-phase pipeline (App. D.3)

```
PHASE 1 — Parallel Candidate Discovery
   For each search direction in introduction_strategy.search_directions:
   For each limitation_search_query in each related_work cluster:
     - Use the host's web search tool to discover up to ~10 candidate papers.
     - Run up to 10 discovery queries in parallel (host-permitting).
     - Collect (title, snippet, url) tuples — no verification yet.
   → PRE-DEDUP before Phase 2 (see Step 1.5 below)

PHASE 2 — Sequential Citation Verification (1 QPS, with cache)
   For each candidate (after pre-dedup), sequentially:
     0. Check s2_cache.json first (scripts/s2_cache.py --check).
        If HIT: use cached response, skip live S2 call. No throttle needed.
        If MISS: proceed with live request below.
     1. Query Semantic Scholar by title:
          GET https://api.semanticscholar.org/graph/v1/paper/search?query=<title>
              &fields=title,abstract,year,authors,venue,externalIds&limit=5
        (Public endpoint, no key. Throttle to 1 QPS for live requests only.)
     2. Store the S2 response in cache: s2_cache.py --store.
     3. Pick the top hit. Check Levenshtein title ratio against the original
        candidate title. If ratio < 70: discard.
     4. Bonus: if year and venue exactly align with hints, add a +5 point
        match-quality bonus.
     5. Require: abstract is non-empty.
     6. Require: paper.year (or month if known) strictly predates cutoff_date.
        Months default to day-1: e.g., "October 2024" → 2024-10-01.
     7. If all checks pass, add to verified pool.
   After all candidates are verified, dedup by Semantic Scholar paperId.
```

The host agent does the LLM/web work; the deterministic helpers in `scripts/`
do the math.

## Step-by-step

### 0. Derive `cutoff_date`

Parse `conference_guidelines.md` for the submission deadline. The paper aligns
research cutoff with venue submission deadline (App. D.1):

| Venue | Cutoff |
|---|---|
| CVPR 2025 | Nov 2024 |
| ICLR 2025 | Oct 2024 |
| Other | One month before the stated submission deadline |

Encode as `YYYY-MM-DD`. Months default to day-1 (e.g., `2024-10-01`).

### 1. Phase 1: Parallel Candidate Discovery

From `outline.json`:

- All `introduction_strategy.search_directions` (3-5 queries)
- For each cluster in `related_work_strategy.subsections`:
  - The cluster's `sota_investigation_mission` becomes a search query
  - All `limitation_search_queries` (1-3 each)

For each query, **use your host's web search tool** (e.g., `WebSearch` in
Claude Code, `@web` in Cursor, the search tool in Antigravity). Collect the
top ~10 candidates per query: title, abstract snippet, source URL.

If your host supports parallel sub-tasks, fire up to 10 concurrent search
queries. If not, run sequentially — slower but functionally equivalent.

#### Optional: Exa as a Phase 1 backend

If your host has no native web search, OR you want a research-paper-focused
backend with better signal-to-noise, you can use [Exa](https://exa.ai) via
the bundled `scripts/exa_search.py` helper. It is **opt-in** and reads
`EXA_API_KEY` from the environment — the repo never commits a key.

```bash
export EXA_API_KEY="your-key-here"   # get one at https://dashboard.exa.ai/
python skills/literature-review-agent/scripts/exa_search.py \
    --query "Sparse attention long context transformers" \
    --num-results 15 \
    --discovered-for "related_work[2.1]"
```

Output is a normalized candidate list ready to merge into
`raw_candidates.json`. Phase 2 verification (Semantic Scholar fuzzy match,
cutoff, dedup) is unchanged. See `references/exa-search-cookbook.md` for
the full recipe, query patterns, cost estimates, and security notes.

Combine all discovered candidates into a single working list. Tag each with
the originating query ID so you can later attribute it to "intro" vs
"related_work[i]".

### 1.5. Pre-dedup before Phase 2

**Always run this before starting Phase 2.** Multiple search queries routinely
return the same papers (e.g., "Attention is All You Need" appears in almost
every NLP discovery query). Verifying duplicates wastes 30-40% of S2 quota
at 1 QPS.

```bash
python skills/literature-review-agent/scripts/pre_dedup_candidates.py \
    --in workspace/raw_candidates.json \
    --out workspace/deduped_candidates.json
# Prints: "150 candidates → 97 unique (53 duplicates removed)"
```

Use `workspace/deduped_candidates.json` as input to Phase 2.

### 2. Phase 2: Sequential Verification via Semantic Scholar (with cache)

For each candidate in `deduped_candidates.json`, in **sequential** order:

**Step A — check cache first** (no S2 call, no throttle needed):
```bash
python skills/literature-review-agent/scripts/s2_cache.py \
    --cache workspace/cache/s2_cache.json \
    --check "<candidate title>"
# exit 0 + prints JSON → use cached response, skip Step B
# exit 1 → proceed to Step B
```
agent-research-aggregatorSkill

Pre-pipeline aggregator that scans AI agent cache directories (.claude, .cursor, .antigravity, .openclaw) or any user-specified directory for experimentation logs, extracts insights and numeric results, and formats them as PaperOrchestra-ready inputs (idea.md + experimental_log.md). TRIGGER when the user says "aggregate my agent logs for paper writing", "extract experiments from my coding agent history", "prepare PaperOrchestra inputs from my cache", "turn my agent logs into a paper", mentions a folder or directory they want to use as the basis for a paper, or wants to run PaperOrchestra but only has scattered agent experiment histories rather than structured inputs. Run this BEFORE paper-orchestra. Also called automatically by paper-orchestra when workspace/inputs/idea.md or workspace/inputs/experimental_log.md are missing.

content-refinement-agentSkill

Step 5 of the PaperOrchestra pipeline (arXiv:2604.05018). Iteratively refine drafts/paper.tex by simulating peer review and applying targeted revisions, with strict accept/revert halt rules. Maintains a worklog and snapshots each iteration so revert is real, not symbolic. TRIGGER when the orchestrator delegates Step 5 or when the user asks to "refine the draft", "iterate on the paper", or "run peer review on this paper".

outline-agentSkill

Step 1 of the PaperOrchestra pipeline (arXiv:2604.05018). Convert (idea.md, experimental_log.md, template.tex, conference_guidelines.md) into a strict JSON outline containing a plotting plan, literature search plan (Intro + Related Work), and section-level writing plan with citation hints. TRIGGER when the orchestrator delegates Step 1 or when the user asks to "outline a paper from raw materials" or "generate the paper structure".

paper-autoratersSkill

Run the four paper-quality autoraters from PaperOrchestra (arXiv:2604.05018, App. F.3) — Citation F1 (P0/P1 partition + Precision/Recall/F1), Literature Review Quality (6-axis 0-100 with anti-inflation rules), SxS Overall Paper Quality (side-by-side), and SxS Literature Review Quality (side-by-side). TRIGGER when the user asks to "score this paper draft", "evaluate against the benchmark", "compare two papers", or "run the autoraters".

paper-orchestraSkill

Orchestrate the full PaperOrchestra (Song et al., 2026, arXiv:2604.05018) five-agent pipeline to turn unstructured research materials (idea, experimental log, LaTeX template, conference guidelines, optional figures) into a submission-ready LaTeX manuscript and compiled PDF. TRIGGER when the user asks to "write a paper from my experiments", "turn this idea and these results into a paper", "generate a conference submission", "run paper-orchestra on X", or otherwise wants the end-to-end paper-writing pipeline. Coordinates the outline-agent, plotting-agent, literature-review-agent, section-writing-agent, and content-refinement-agent skills.

paper-writing-benchSkill

Reverse-engineer raw materials (Sparse idea, Dense idea, experimental log) from an existing AI research paper to build a benchmark case for evaluating paper-writing pipelines. Replicates the PaperWritingBench dataset construction procedure from arXiv:2604.05018 §3 / App. C. TRIGGER when the user asks to "build a benchmark case from this paper", "reverse-engineer raw materials", or "evaluate my pipeline against PaperWritingBench".

plotting-agentSkill

Step 2 of the PaperOrchestra pipeline (arXiv:2604.05018). Execute the visualization plan from outline.json — render plots and conceptual diagrams from experimental_log.md and idea.md, optionally refine via VLM critique loop, and produce context-aware captions. Runs in parallel with the literature-review-agent. TRIGGER when the orchestrator delegates Step 2 or when the user asks to "generate the figures for my paper" or "render the plots from this experiment log".

section-writing-agentSkill

Step 4 of the PaperOrchestra pipeline (arXiv:2604.05018). ONE single multimodal LLM call that drafts the remaining paper sections (Abstract, Methodology, Experiments, Conclusion), extracts numeric values from experimental_log.md into LaTeX booktabs tables, splices the generated figures from Step 2, and merges everything into the template that already contains Intro + Related Work from Step 3. TRIGGER when the orchestrator delegates Step 4 or when the user asks to "write the methodology and experiments sections" or "fill in the rest of the paper".