- The paper presents LLM-native figures that integrate interactive visualization with embedded code and provenance for dynamic data analysis.
- It introduces a bidirectional mapping that links human figure manipulations to executable data transformations with high accuracy.
- The Nexus system shows enhanced reproducibility and interactive workflows in scientific discovery through multi-agent LLM coordination.
Introduction and Motivation
The convergence of LLMs and computational science is reshaping the scientific workflow, but current research artifacts—specifically, scientific figures—remain limited as static, visually-oriented outputs. Despite the sophisticated capabilities of LLMs, both human and AI agents typically interpret figures as opaque images, disconnected from the data provenance and transformations that generated them. This paper introduces the concept of LLM-native figures: structured scientific artifacts designed to be simultaneously accessible to human interpretation and directly actionable by LLMs via embedded analytical provenance, executable code, and bidirectional mappings between visual and computational operations (2604.08491).
Conceptual Framework
LLM-native figures are defined as composite computational objects that encapsulate the complete lineage of a visualization, including the subset of data, analysis code, visualization specifications, and the rendered graphics. Their design rests on dual-legibility: figures must be intelligible to human users while remaining fully machine-interpretable, thereby enabling advanced AI-driven scientific reasoning and manipulation.
Figure 1: The dynamic generation, coordination, and provenance-preserving nature of LLM-native figures through iterative, human-in-the-loop scientific exploration.
A core contribution is the bidirectional mapping Rt​ at time t, which provides a precise link between user-level, direct manipulations of figures (such as brushing, filtering, or selecting visual marks) and the corresponding analytical actions and data transformations. The framework employs a multi-agent LLM system to parse mixed-modal user inputs—natural language and direct manipulation—and orchestrates analytical pipelines as sequential, composable actions. Iterative exploration is formalized via "data-driven artifacts", which act as version-controlled, navigable ledgers of exploration states interlinked by provenance and coordination rules.
Figure 2: Schematic of the multi-layered framework for LLM-native artifacts, bidirectional mappings, and compositional extension of analysis via coordinated figures.
System Implementation
The authors instantiate their framework in the Nexus system, targeting science of science (SciSci) investigations as a domain testbed that encompasses complex, heterogeneous datasets. Nexus integrates a hybrid graphical and language-based interface, underpinned by a multi-agent LLM engine that handles action planning, execution, and evaluation.
The system design supports:
- Dual interaction modalities: natural language queries and direct visual manipulation.
- Action-level planning and code generation, decomposing high-level user queries into atomic data operations, transformations, and visualizations.
- Data management using relational, vector (for retrieval-augmented generation), and artifact-oriented databases to ensure computational provenance and replayability.
Figure 3: Overview of the Nexus system, including the LUI/GUI front-end, multi-agent LLM engine, and integrated data management for provenance and action granularity.
Case Study and Results
A detailed SciSci case study involving academic innovation landscapes is conducted. Starting from exploratory queries such as the distribution of inventors by invention count and publication citation in patents, users can iteratively refine analyses through log transforms, selection-based groupings, and department-level breakdowns—all using a mix of natural language and direct figure manipulation.
Representative results demonstrate the automatic update of linked figures following user-initiated brushing or filtering, surfacing actionable patterns (e.g., demographic asymmetries in latent innovation) and supporting rigorous, reproducible analytical chains. All analytical steps, executed code, and results are archived as part of a navigable artifact for later verification and sharing.
Figure 4: Example Nexus workflow for exploring faculty innovation, including dynamic figure updates, subgroup analyses, and versioned artifact management.
Computational Evaluation
To assess the fidelity of the bidirectional mapping and artifact mechanism, 308 structured test cases spanning multiple figure types, interactions, and analytical complexities were deployed. The evaluation focused on two axes: (1) mapping analytical instructions to figures (Analytical → Visualization) and (2) mapping figure-based operations back to data queries (Visualization → Analytical).
Key results:
- End-to-End Analytical → Visualization Accuracy: 92.7%
- Visualization → Analytical (follow-up and coordination) Accuracy: 79.8% (follow-up question-answering), 91.0% (figure coordination)
- Execution success rate for initial queries: 96.7%
Most errors were attributed to LLM-generated SQL/logic mismatches (e.g., semantic interpretation of brush ranges), emphasizing the need for more robust prompts and explicit interaction semantics.
Figure 5: Fidelity of bidirectional mappings across evaluation tasks and metrics.
Implications and Future Directions
This research recasts figures from static endpoints to bidirectionally interactive computational interfaces, enabling LLMs to reason, manipulate, and extend visualizations in partnership with human analysts. Provenance is not a separate trace but an intrinsic property of every artifact, improving reproducibility and acting as a substrate for systematic auditing and recombination of analytical logic.
The LLM-native figure paradigm reveals new forms of human-machine complementarity: visual pattern detection remains a human strength; systematic data lineage tracing and transformation is handled by LLM agents. The approach invites shifts in UI paradigms, supporting non-linear, artifact-oriented exploration and extending beyond text-only or linear chat-based paradigms.
The framework is generalizable to any scientific domain with structured data and visualization, with future extensions targeting scaling artifact versioning, supporting more expressive interaction grammars, and enabling multi-agent collaborative reasoning. Limitations remain in applicability to unstructured data, error correction for ambiguous queries, and confirmatory (as opposed to exploratory) analysis use cases.
Conclusion
LLM-native figures, as introduced and implemented in this work, offer a robust computational framework to unify visualization, code provenance, iterative reasoning, and AI agency in scientific discovery. They lay foundational work for artifact-centric, transparent, and extensible human-AI collaboration workflows, with strong empirical performance on both analytical fidelity and interaction generalization. The paradigm substantially augments the reproducibility, extensibility, and auditability of scientific research in data-intensive domains.