PaperOrchestra: Multi-Agent AI That Writes Research Papers

This presentation explores PaperOrchestra, a groundbreaking multi-agent system that autonomously transforms raw research ideas and experimental logs into complete, submission-ready academic manuscripts. Unlike previous approaches that tightly couple writing with experimental execution, PaperOrchestra decouples these processes through five specialized agents handling outline generation, visualization, literature review, section writing, and iterative refinement. Evaluated on the new PaperWritingBench benchmark with 200 papers from top-tier conferences, the system achieves 50-68% win rates over the strongest AI baselines in literature review quality and acceptance rates approaching human-written papers at 81-84%.
Script
Writing a rigorous research paper demands deep literature synthesis, context-aware visualization, and strict adherence to submission standards. Current AI research agents stumble at this final mile, unable to transform raw experimental logs and unstructured ideas into publication-ready manuscripts. The bottleneck isn't generating results—it's authoring the paper itself.
The researchers identified three critical failures in existing systems. First, entangled architectures prevent independent manuscript generation. Second, naive citation mechanisms miss the deep literature context that distinguishes publishable work. Third, these systems collapse when handed the messy, informal materials that characterize early research phases.
PaperOrchestra attacks this problem through radical decoupling and specialization.
Each agent tackles a distinct subtask in the manuscript construction pipeline. The Outline Agent creates the blueprint. The Plotting Agent synthesizes visuals through iterative refinement with vision-language models. The Literature Agent doesn't just retrieve papers—it validates relevance and constructs verified citation pools. The Section Writing Agent integrates everything while respecting anonymization constraints. Finally, the Refinement Agent implements an automated review-feedback loop, iteratively improving clarity and rigor.
To benchmark this rigorously, the authors created PaperWritingBench. They deconstructed 200 published papers, purging all author information and existing citations. Systems receive sparse idea summaries and experimental logs—nothing more. This anti-leakage design ensures that generated manuscripts reflect genuine synthesis capability, not retrieval of memorized content.
The results are decisive. In head-to-head human evaluation, PaperOrchestra dominates across both literature review depth and overall manuscript quality. Against AI Scientist version 2, the strongest baseline, PaperOrchestra secures win rates between 50 and 68 percent for literature reviews and 14 to 38 percent for complete manuscripts. Even more striking, the system achieves acceptance rates of 81 to 84 percent under automated peer review agents—approaching the 86 to 94 percent acceptance of human-written ground truth papers. Citation coverage tells the same story: PaperOrchestra generates 45 to 48 references on average, closely tracking the 59 citations in human manuscripts, while baselines produce sparse lists that artificially inflate precision but fail to provide comprehensive context.
PaperOrchestra proves that modular, decoupled multi-agent architectures can autonomously synthesize raw research into publication-ready work. The bottleneck has shifted from authoring to ideation. Visit EmergentMind.com to explore this paper further and create your own research video presentations.