Skip to main content
Google's PaperBanana AI system with five agents generating scientific diagrams, missing icons. [ppc.land](https://ppc.land/go

Editorial illustration for Google's PaperBanana uses five AI agents to auto-generate diagrams, missing icons

AI Diagrams Decoded: Five-Agent Paper Visualization Tool

Google's PaperBanana uses five AI agents to auto-generate diagrams, missing icons

3 min read

Google’s new PaperBanana tool promises to stitch together scientific diagrams without a human hand. Five separate AI agents coordinate the process, each handling a slice of the workflow—from layout planning to caption drafting. The system can pull data from a paper, decide where a plot belongs, and even suggest colour palettes, all in a single pass.

It sounds like a shortcut for researchers racing to meet conference deadlines. Yet the output still shows gaps where the agents stumble over the finer details that scholars expect. While the overall images are clean, the diagrams often lack the nuanced symbols and bespoke shapes that have become routine in today’s AI‑driven publications.

That shortfall is why the researchers stress a particular weakness. How the five agents split the work…

The researchers say these fall short with complex visual elements like specialized icons or custom shapes, now standard in modern AI publications. Pure image generation models look good but rarely meet academic publication standards. How the five agents split the work PaperBanana divides tasks among specialized AI agents.

The first searches a reference database for similar diagrams to use as templates. The second translates the paper's method description into a detailed image description. The third refines this using aesthetics guidelines the system extracted from NeurIPS publications.

The fourth agent renders the image using an image generation model. The fifth handles quality control: checking results for errors and suggesting fixes. This generation-and-criticism cycle runs three times before outputting the final diagram.

For statistical plots like bar or line charts, the system takes a different route: instead of generating graphics as images, it writes Python code for the Matplotlib library. This keeps numbers accurate, something image generation models often mess up. Human reviewers pick AI diagrams most of the time The researchers built their own benchmark with 292 test cases from NeurIPS 2025 publications, scoring diagrams on content fidelity, conciseness, readability, and aesthetics.

PaperBanana beat simple image generation across all categories. Conciseness saw the biggest jump at 37.2 percent. Readability improved 12.9 percent, aesthetics 6.6 percent, and content fidelity 2.8 percent.

Human reviewers preferred PaperBanana diagrams nearly 73 percent of the time.

PaperBanana shows promise, but questions linger. By coordinating five specialized AI agents, the system can turn method descriptions into diagrams that reviewers judged superior to generic image‑generation outputs in roughly 73 percent of cases. Yet the authors admit the agents stumble when faced with complex visual elements—specialized icons, custom shapes—that have become commonplace in scientific figures.

Pure image‑generation models may produce aesthetically pleasing pictures, but they rarely satisfy the strict standards of academic publishing. How the five agents split the work remains only briefly outlined; the paper does not detail the exact hand‑off points or error‑recovery mechanisms. Consequently, it is unclear whether the approach can scale to papers that rely heavily on bespoke graphics.

The research underscores that collaborative AI pipelines can outperform single‑model solutions, but the gap between polished output and the nuanced demands of scholarly illustration persists. Further testing on a broader set of disciplines would be needed to gauge robustness across varied visual conventions.

Further Reading

Common Questions Answered

How do the five AI agents in PlotGen collaborate to generate scientific visualizations?

PlotGen uses a multi-agent framework with specialized agents including a Query Planning Agent, a Code Generation Agent, and three retrieval feedback agents. These agents work together iteratively, with the feedback agents (Numeric, Lexical, and Visual) using multimodal LLMs to refine data accuracy, textual labels, and visual correctness of generated plots.

What performance improvements did PlotGen demonstrate on the MatPlotBench dataset?

PlotGen achieved a 4-6 percent improvement over strong baselines on the MatPlotBench dataset. The system enhanced user trust in LLM-generated visualizations and improved novice productivity by significantly reducing the debugging time needed for plot errors.

What challenges do novice users typically face when creating scientific data visualizations?

Novice users often struggle with the complexity of selecting appropriate visualization tools and mastering visualization techniques. Large Language Models (LLMs) have shown potential in code generation, but previously faced challenges with accuracy and required extensive iterative debugging.