Skip to main content
ClaudeWave
Skill575 estrellas del repoactualizado 10d ago

content-refinement-agent

content-refinement-agent iteratively improves research paper drafts through simulated peer review cycles. This Claude Code skill applies targeted revisions based on structured reviews, maintaining full iteration snapshots and a worklog while halting on acceptance, reversion, or iteration limits. Use this when implementing Step 5 of the PaperOrchestra pipeline or when explicitly asked to refine, iterate, or peer-review a paper draft.

Instalar en Claude Code
Copiar
git clone --depth 1 https://github.com/Ar9av/PaperOrchestra /tmp/content-refinement-agent && cp -r /tmp/content-refinement-agent/skills/content-refinement-agent ~/.claude/skills/content-refinement-agent
Después abre una sesión nueva de Claude Code; el skill carga automáticamente.

SKILL.md

# Content Refinement Agent (Step 5)

Faithful implementation of the Content Refinement Agent from PaperOrchestra
(Song et al., 2026, arXiv:2604.05018, §4 Step 5, App. F.1 pp. 49–51).

**Cost: ~5–7 LLM calls** (App. B), typically ~3 refinement iterations, each
consisting of one reviewer call and one revision call.

The paper highlights this step as one of the largest contributors to overall
quality: refinement alone accounts for +19% (CVPR) and +22% (ICLR) absolute
acceptance-rate improvement (Fig. 4). Get this step right.

## Inputs

- `workspace/drafts/paper.tex` — output of Step 4
- `workspace/inputs/conference_guidelines.md`
- `workspace/inputs/experimental_log.md` — used as ground truth for the
  hallucination check
- `workspace/citation_pool.json` / `workspace/refs.bib` — the allowed
  bibliography

## Outputs

- `workspace/refinement/iter1/`, `iter2/`, `iter3/` — per-iteration snapshots
  containing `paper.tex`, `paper.pdf`, `review.json`, `score.json`
- `workspace/refinement/worklog.json` — append-only history of decisions
- `workspace/final/paper.tex` and `workspace/final/paper.pdf` — copy of the
  best accepted snapshot

## The refinement loop

```
prev_score = score(paper.tex)                  # baseline from initial draft
snapshot iter0/

for iter in 1..ITER_CAP (default 3):
    1. simulate_review(paper.tex) → review.json
       (uses `references/reviewer-rubric.md` rubric)

    2. apply_revision(paper.tex, review.json) → new_paper.tex
       (uses verbatim Refinement Agent prompt at `references/prompt.md`)

    3. snapshot iter<N>/ with new_paper.tex, review.json
       latexmk -pdf new_paper.tex → iter<N>/paper.pdf

    4. score(new_paper.tex) → curr_score

    5. decide via score_delta.py:
       - if curr.overall > prev.overall:                       ACCEPT
       - elif curr.overall == prev.overall and net_subaxis ≥0: ACCEPT
       - else:                                                 REVERT

    6. apply_worklog.py to append the decision

    7. if REVERT or no actionable weaknesses or iter == ITER_CAP: HALT

    paper.tex ← new_paper.tex   (only on ACCEPT)
    prev_score ← curr_score

cp <best iter>/paper.tex → workspace/final/paper.tex
```

The "best" snapshot at HALT is the one with the highest accepted overall
score. On a REVERT halt, the best is the iteration immediately before the
revert.

## Step-by-step

### 0. Pre-refinement integrity gate

Before snapshotting or scoring the initial draft, run two gates in order:

**Gate A — AI failure modes** (load `references/ai-failure-modes.md`, runs once):

Load `references/ai-failure-modes.md` (which points to `skills/shared/ai_failure_modes.md`).
Run all 7 checks against the draft and the inputs. This gate runs **once only**,
at the start of iteration 1.

- CONFIRMED failure → write HALT entry to worklog.json, report to user, stop.
- SUSPECTED failure → add WARNING comment to paper.tex, log in worklog.json, continue.
- No failures → proceed.

**Gate B — Claim-evidence provenance** (runs once, WARN gate):

```bash
python skills/paper-orchestra/scripts/claim_evidence_gate.py \
    --paper workspace/drafts/paper.tex \
    --log   workspace/inputs/experimental_log.md \
    --out   workspace/claim_evidence_report.json
```

Exit 0 → PASS, proceed normally.
Exit 1 → WARN: unsupported numeric claims found. Log in worklog.json as:
`{gate: "claim_evidence", status: "WARN", unsupported_count: N, report: "workspace/claim_evidence_report.json"}`
Pass the `unsupported` list from the report to the revision agent in Step 3 as
an additional instruction: "The following numeric values appear in the paper but
cannot be corroborated in experimental_log.md — verify or remove them: ..."
Do NOT halt on Gate B warnings; the revision agent will address them.

**Gate C — Read research brief** (every run, no exit code):

If `workspace/research_brief.md` exists, read it before all reviewer calls.
Pass the "Sections where evidence was thin" list from §4 as additional
context to the Devil's Advocate reviewer. This surfaces the highest-risk
sections for CRITICAL scrutiny.

### 0b. Snapshot the initial draft

```bash
python skills/content-refinement-agent/scripts/snapshot.py \
    --src workspace/drafts/paper.tex \
    --dst workspace/refinement/iter0/
```

This creates `iter0/paper.tex`. Then compile to `iter0/paper.pdf`:


```bash
cd workspace/refinement/iter0/ && latexmk -pdf -interaction=nonstopmode paper.tex
```

Score it (see Step 1 below) → `iter0/score.json`.

### 1. Simulate peer review

For each iteration N starting from 1:

**Writing quality pre-check (start of every iteration):** Load
`references/writing-quality-check.md` and run the 5-category checklist
(Categories A–E) against the current draft. Note violations and add them to
the revision agenda.

**Update critique memory before the reviewer call** (iter N ≥ 2 only — skip for iter 1):

```bash
python skills/content-refinement-agent/scripts/update_critique_memory.py \
    --worklog workspace/refinement/worklog.json \
    --review  workspace/refinement/iter<N-1>/review.json \
    --iter    <N> \
    --out     workspace/refinement/critique_memory.json
```

This produces `critique_memory.json` with `focus_on` (persistent unresolved
issues) and `do_not_reflag` (already-resolved issues). Inject both lists into
the reviewer system prompt verbatim:

```
CRITIQUE MEMORY — you must honour this before reviewing:

FOCUS ON (flagged in prior iterations, not yet resolved — prioritise these):
<critique_memory.focus_on items, one per line>

DO NOT RE-FLAG (already addressed in prior iterations):
<critique_memory.do_not_reflag items, one per line>
```

This prevents the reviewer from re-discovering already-fixed issues and
from missing genuinely stuck problems.

Load `references/reviewer-rubric.md` as the system prompt for the simulated
reviewer call. The reviewer reads `iter<N-1>/paper.pdf` (or `paper.tex` if
your host LLM lacks PDF input) and produces a JSON of strengths,
weaknesses, questions, and pe
agent-research-aggregatorSkill

Pre-pipeline aggregator that scans AI agent cache directories (.claude, .cursor, .antigravity, .openclaw) or any user-specified directory for experimentation logs, extracts insights and numeric results, and formats them as PaperOrchestra-ready inputs (idea.md + experimental_log.md). TRIGGER when the user says "aggregate my agent logs for paper writing", "extract experiments from my coding agent history", "prepare PaperOrchestra inputs from my cache", "turn my agent logs into a paper", mentions a folder or directory they want to use as the basis for a paper, or wants to run PaperOrchestra but only has scattered agent experiment histories rather than structured inputs. Run this BEFORE paper-orchestra. Also called automatically by paper-orchestra when workspace/inputs/idea.md or workspace/inputs/experimental_log.md are missing.

literature-review-agentSkill

Step 3 of the PaperOrchestra pipeline (arXiv:2604.05018). Execute the literature search strategy from outline.json — discover candidate papers via web search, verify them through Semantic Scholar (Levenshtein > 70 fuzzy title match, temporal cutoff, dedup by paperId), cross-corroborate against Crossref + OpenAlex to flag hallucinated citations, build a BibTeX file, and draft Introduction + Related Work using ≥90% of the verified pool. Runs in parallel with the plotting-agent. TRIGGER when the orchestrator delegates Step 3 or when the user asks to "find citations for my paper", "draft the related work", or "build the bibliography".

outline-agentSkill

Step 1 of the PaperOrchestra pipeline (arXiv:2604.05018). Convert (idea.md, experimental_log.md, template.tex, conference_guidelines.md) into a strict JSON outline containing a plotting plan, literature search plan (Intro + Related Work), and section-level writing plan with citation hints. TRIGGER when the orchestrator delegates Step 1 or when the user asks to "outline a paper from raw materials" or "generate the paper structure".

paper-autoratersSkill

Run the four paper-quality autoraters from PaperOrchestra (arXiv:2604.05018, App. F.3) — Citation F1 (P0/P1 partition + Precision/Recall/F1), Literature Review Quality (6-axis 0-100 with anti-inflation rules), SxS Overall Paper Quality (side-by-side), and SxS Literature Review Quality (side-by-side). TRIGGER when the user asks to "score this paper draft", "evaluate against the benchmark", "compare two papers", or "run the autoraters".

paper-orchestraSkill

Orchestrate the full PaperOrchestra (Song et al., 2026, arXiv:2604.05018) five-agent pipeline to turn unstructured research materials (idea, experimental log, LaTeX template, conference guidelines, optional figures) into a submission-ready LaTeX manuscript and compiled PDF. TRIGGER when the user asks to "write a paper from my experiments", "turn this idea and these results into a paper", "generate a conference submission", "run paper-orchestra on X", or otherwise wants the end-to-end paper-writing pipeline. Coordinates the outline-agent, plotting-agent, literature-review-agent, section-writing-agent, and content-refinement-agent skills.

paper-writing-benchSkill

Reverse-engineer raw materials (Sparse idea, Dense idea, experimental log) from an existing AI research paper to build a benchmark case for evaluating paper-writing pipelines. Replicates the PaperWritingBench dataset construction procedure from arXiv:2604.05018 §3 / App. C. TRIGGER when the user asks to "build a benchmark case from this paper", "reverse-engineer raw materials", or "evaluate my pipeline against PaperWritingBench".

plotting-agentSkill

Step 2 of the PaperOrchestra pipeline (arXiv:2604.05018). Execute the visualization plan from outline.json — render plots and conceptual diagrams from experimental_log.md and idea.md, optionally refine via VLM critique loop, and produce context-aware captions. Runs in parallel with the literature-review-agent. TRIGGER when the orchestrator delegates Step 2 or when the user asks to "generate the figures for my paper" or "render the plots from this experiment log".

section-writing-agentSkill

Step 4 of the PaperOrchestra pipeline (arXiv:2604.05018). ONE single multimodal LLM call that drafts the remaining paper sections (Abstract, Methodology, Experiments, Conclusion), extracts numeric values from experimental_log.md into LaTeX booktabs tables, splices the generated figures from Step 2, and merges everything into the template that already contains Intro + Related Work from Step 3. TRIGGER when the orchestrator delegates Step 4 or when the user asks to "write the methodology and experiments sections" or "fill in the rest of the paper".