- The paper introduces a decoupled, agentic architecture that separates high-level orchestration from low-level data analysis to improve insight extraction from VSM simulations.
- It employs iterative, multi-focal reasoning with dedicated tools for node discovery, attribute extraction, and taxonomy navigation to manage extensive KPIs and time-series data.
- Quantitative evaluations demonstrate state-of-the-art performance with top-tier LLMs while highlighting challenges in scalability and computational cost.
Agentic Insight Generation in VSM Simulations: A Decoupled Architecture for Robust Data Discovery
Context and Motivation
Extracting actionable and contextually relevant insights from Value Stream Map (VSM) simulations is a persistent challenge, especially as simulation fidelity and complexity increase. Classical VSMs provide static representations of material and information flows, but when augmented with discrete event simulation, they yield vast, intricate, and hierarchically structured outputs—often containing hundreds of pre-calculated KPIs and multidimensional time-series data per scenario. This scenario brings two principal impediments to leveraging LLMs: immense input sizes and the need for nuanced, multi-hop reasoning grounded in domain semantics.
Earlier approaches employing LLMs for simulation result analysis predominantly relied on monolithic pipelines or single-pass Retrieval-Augmented Generation (RAG). While capable of direct lookup for basic metrics, these approaches are systematically inadequate for the nuanced disaggregation necessary to resolve ambiguous or semantically similar KPIs in VSM outputs.
The work under review proposes a decoupled, agentic architecture tailored to the demands of VSM simulation analysis, decoupling high-level orchestration from low-level data analysis. The architecture injects domain expertise into the agent workflow, enabling progressive data discovery, dynamic source selection, and tightly controlled context windows to avoid information overload and context rot.
VSM Simulation Data Modeling
The VSM simulation data are formatted as attributed graphs with strict topology, supporting realistic manufacturing and logistics network abstractions. Nodes represent logistical entities such as processing stations, warehouses, or customers, each with specialized attributes (e.g., safety stock limits, cycle times), and edges model the material/information flow with type constraints.
A simulation run yields time-series and a broad taxonomy of pre-calculated KPIs, which the agent must navigate. The size and complexity of these datasets prohibit flat data retrieval; instead, agents must perform iterative, multi-focal reasoning over both static topology and highly dynamic logs.
Figure 1: Example digital value stream map (VSM) as attributed graph, the core context for simulation-based reasoning tasks.
Architecture: Decoupled Agentic Reasoning and Summarization
The central technical contribution is the two-step agentic design:
- Orchestration Agent: Handles top-level logical reasoning and task decomposition. It leverages domain knowledge to traverse the VSM simulation taxonomy, identifying promising data regions without ingesting raw data into its context. It queries the topology via dedicated tools (node discovery, attribute extraction, taxonomy navigation), progressively narrowing the information relevant to the query.
- Summarization Subworkflow: Once a target data element is identified, the orchestrator delegates analysis to this sub-agent. The subworkflow processes only the necessary data element, using both the raw data and the expert-provided metadata, to extract the explicit insight or perform aggregation as required by the task.
This iterative discovery and delegation paradigm sharply delineates planning from analysis, minimizes context window pollution, and avoids the failure modes prevalent in monolithic RAG or code-generation-based agents when handling complex VSM data.
Figure 2: Four-step walkthrough of the decoupled architecture for a sample VSM query, with marked tool invocations and information returns.
The orchestration agent is equipped with four core tools to support progressive discovery:
- Node Discovery Tool: Enumerates graph nodes optionally by class/type.
- Attribute Extraction Tool: Retrieves all attributes for a single node.
- Taxonomy Navigation Tool: Lists all available data elements in a simulation section with domain-expert metadata.
- Summarization Tool: Handles custom analyses or aggregations on individual data elements.
This modularity enforces strict information diet principles and allows the agent to remain robust to increases in simulation scale or changes in KPI taxonomy.
Datasets and Experimental Protocol
Robust evaluation was conducted on two curated datasets:
- Development Set: 112 triples covering 12 VSM contexts (primarily sanitized real-world models), each paired with expert-authored queries and ground-truth answers.
- Evaluation Set: 47 hand-curated triples (20 queries, three contexts), with heightened task diversity and strict separation from development data.
Ground-truth was established by VSM domain experts, and agent outputs were assessed via both human expert review and LLM-as-a-Judge methodology (validated using multiple LLMs for rating/consensus scoring).
Quantitative Results
Empirical evaluation focused on cross-model benchmarking of the agentic architecture instantiated with various LLMs:
Top-line numerical results:
- Claude-Opus 4.6 achieved the highest mean rating (85.96/100), with SOTA open models like GLM-5 following closely (83.88/100).
- Robustness mirrored scale: smaller models (Ministral-3-14B) had high output variance and lower scores, but even the mid-sized Qwen-3-30B scored 72.66/100, demonstrating scalability with model size.
- Output variance (SD across four runs) dropped dramatically with increasing model scale: Claude-Opus (SD=0.86), GLM-5 (SD=3.92), Ministral-3-14B (SD=6.64).
Crucially, while closed-source models still outperformed open-weights, recent advances have closed the capability gap, particularly on well-structured, taxonomy-driven tasks.
Qualitative Insights and Failure Patterns
Persistent error cases (~15%) involved incorrect value extraction from wrong data sources rather than direct hallucination or analytical mistakes. This originated from navigation failures within the orchestration workflow—often induced by ambiguous or misleading simulation metadata, not by deficiencies in time-series analysis per se. Thus, future reliability gains may be achievable through enhanced domain ontology specification, improved tool documentation, or additional training on VSM-specific navigation tasks.
Limitations and Future Directions
Key limitations are noted:
- Agent reliability is still strongly model scale-dependent. Smaller LLMs remain brittle in navigation and context allocation.
- Computational cost is significant, given reliance on long-context inference and multi-step tool invocation, impacting energy use and evaluation run-time.
- The design presumes data elements (including long time-series) fit within manageable context—future work must address chunking, hierarchical summarization, and scalable pre-processing.
To reduce these bottlenecks, the authors propose exploration of alternative agentic patterns for more efficient data discovery, improved summarization workflows for lengthy timeseries, and ultimately, closing the full simulation-based optimization loop (i.e., enabling closed-loop scenario generation, simulation, and analysis all via agentic orchestration).
Conclusion
This study substantiates the viability and high accuracy of a decoupled, agentic architecture for LLM-driven insight discovery in complex VSM simulation domains (2604.12421). The design demonstrably bridges the gap between classical expert-driven workflows and automated, scalable simulation analysis, yielding SOTA performance with top-tier LLMs and opening a credible path for full agentic simulation optimization. Key challenges remain in scaling robustness to lower-resource LLMs and reducing operational cost, motivating future research into robust agentic orchestration, advanced tool-use, and VSM-specific prompt engineering for industrial AI.