Agentic Insight Generation in VSM Simulations

Published 14 Apr 2026 in cs.CL | (2604.12421v1)

Abstract: Extracting actionable insights from complex value stream map simulations can be challenging, time-consuming, and error-prone. Recent advances in LLMs offer new avenues to support users with this task. While existing approaches excel at processing raw data to gain information, they are structurally unfit to pick up on subtle situational differences needed to distinguish similar data sources in this domain. To address this issue, we propose a decoupled, two-step agentic architecture. By separating orchestration from data analysis, the system leverages progressive data discovery infused with domain expert knowledge. This architecture allows the orchestration to intelligently select data sources and perform multi-hop reasoning across data structures while maintaining a slim internal context. Results from multiple state-of-the-art LLMs demonstrate the framework's viability: with top-tier models achieving accuracies of up to 86% and demonstrating high robustness across evaluation runs.

Abstract PDF Upgrade to Chat

Authors (6)

Summary

The paper introduces a decoupled, agentic architecture that separates high-level orchestration from low-level data analysis to improve insight extraction from VSM simulations.
It employs iterative, multi-focal reasoning with dedicated tools for node discovery, attribute extraction, and taxonomy navigation to manage extensive KPIs and time-series data.
Quantitative evaluations demonstrate state-of-the-art performance with top-tier LLMs while highlighting challenges in scalability and computational cost.

Agentic Insight Generation in VSM Simulations: A Decoupled Architecture for Robust Data Discovery

Context and Motivation

Extracting actionable and contextually relevant insights from Value Stream Map (VSM) simulations is a persistent challenge, especially as simulation fidelity and complexity increase. Classical VSMs provide static representations of material and information flows, but when augmented with discrete event simulation, they yield vast, intricate, and hierarchically structured outputs—often containing hundreds of pre-calculated KPIs and multidimensional time-series data per scenario. This scenario brings two principal impediments to leveraging LLMs: immense input sizes and the need for nuanced, multi-hop reasoning grounded in domain semantics.

Earlier approaches employing LLMs for simulation result analysis predominantly relied on monolithic pipelines or single-pass Retrieval-Augmented Generation (RAG). While capable of direct lookup for basic metrics, these approaches are systematically inadequate for the nuanced disaggregation necessary to resolve ambiguous or semantically similar KPIs in VSM outputs.

The work under review proposes a decoupled, agentic architecture tailored to the demands of VSM simulation analysis, decoupling high-level orchestration from low-level data analysis. The architecture injects domain expertise into the agent workflow, enabling progressive data discovery, dynamic source selection, and tightly controlled context windows to avoid information overload and context rot.

VSM Simulation Data Modeling

The VSM simulation data are formatted as attributed graphs with strict topology, supporting realistic manufacturing and logistics network abstractions. Nodes represent logistical entities such as processing stations, warehouses, or customers, each with specialized attributes (e.g., safety stock limits, cycle times), and edges model the material/information flow with type constraints.

A simulation run yields time-series and a broad taxonomy of pre-calculated KPIs, which the agent must navigate. The size and complexity of these datasets prohibit flat data retrieval; instead, agents must perform iterative, multi-focal reasoning over both static topology and highly dynamic logs.

Figure 1: Example digital value stream map (VSM) as attributed graph, the core context for simulation-based reasoning tasks.

Architecture: Decoupled Agentic Reasoning and Summarization

The central technical contribution is the two-step agentic design:

Orchestration Agent: Handles top-level logical reasoning and task decomposition. It leverages domain knowledge to traverse the VSM simulation taxonomy, identifying promising data regions without ingesting raw data into its context. It queries the topology via dedicated tools (node discovery, attribute extraction, taxonomy navigation), progressively narrowing the information relevant to the query.
Summarization Subworkflow: Once a target data element is identified, the orchestrator delegates analysis to this sub-agent. The subworkflow processes only the necessary data element, using both the raw data and the expert-provided metadata, to extract the explicit insight or perform aggregation as required by the task.

This iterative discovery and delegation paradigm sharply delineates planning from analysis, minimizes context window pollution, and avoids the failure modes prevalent in monolithic RAG or code-generation-based agents when handling complex VSM data.

Figure 2: Four-step walkthrough of the decoupled architecture for a sample VSM query, with marked tool invocations and information returns.

Tooling and Workflow

The orchestration agent is equipped with four core tools to support progressive discovery:

Node Discovery Tool: Enumerates graph nodes optionally by class/type.
Attribute Extraction Tool: Retrieves all attributes for a single node.
Taxonomy Navigation Tool: Lists all available data elements in a simulation section with domain-expert metadata.
Summarization Tool: Handles custom analyses or aggregations on individual data elements.

This modularity enforces strict information diet principles and allows the agent to remain robust to increases in simulation scale or changes in KPI taxonomy.

Datasets and Experimental Protocol

Robust evaluation was conducted on two curated datasets:

Development Set: 112 triples covering 12 VSM contexts (primarily sanitized real-world models), each paired with expert-authored queries and ground-truth answers.
Evaluation Set: 47 hand-curated triples (20 queries, three contexts), with heightened task diversity and strict separation from development data.

Ground-truth was established by VSM domain experts, and agent outputs were assessed via both human expert review and LLM-as-a-Judge methodology (validated using multiple LLMs for rating/consensus scoring).

Quantitative Results

Empirical evaluation focused on cross-model benchmarking of the agentic architecture instantiated with various LLMs:

Top-line numerical results:

Claude-Opus 4.6 achieved the highest mean rating (85.96/100), with SOTA open models like GLM-5 following closely (83.88/100).
Robustness mirrored scale: smaller models (Ministral-3-14B) had high output variance and lower scores, but even the mid-sized Qwen-3-30B scored 72.66/100, demonstrating scalability with model size.
Output variance (SD across four runs) dropped dramatically with increasing model scale: Claude-Opus (SD=0.86), GLM-5 (SD=3.92), Ministral-3-14B (SD=6.64).

Crucially, while closed-source models still outperformed open-weights, recent advances have closed the capability gap, particularly on well-structured, taxonomy-driven tasks.

Qualitative Insights and Failure Patterns

Persistent error cases (~15%) involved incorrect value extraction from wrong data sources rather than direct hallucination or analytical mistakes. This originated from navigation failures within the orchestration workflow—often induced by ambiguous or misleading simulation metadata, not by deficiencies in time-series analysis per se. Thus, future reliability gains may be achievable through enhanced domain ontology specification, improved tool documentation, or additional training on VSM-specific navigation tasks.

Limitations and Future Directions

Key limitations are noted:

Agent reliability is still strongly model scale-dependent. Smaller LLMs remain brittle in navigation and context allocation.
Computational cost is significant, given reliance on long-context inference and multi-step tool invocation, impacting energy use and evaluation run-time.
The design presumes data elements (including long time-series) fit within manageable context—future work must address chunking, hierarchical summarization, and scalable pre-processing.

To reduce these bottlenecks, the authors propose exploration of alternative agentic patterns for more efficient data discovery, improved summarization workflows for lengthy timeseries, and ultimately, closing the full simulation-based optimization loop (i.e., enabling closed-loop scenario generation, simulation, and analysis all via agentic orchestration).

Conclusion

This study substantiates the viability and high accuracy of a decoupled, agentic architecture for LLM-driven insight discovery in complex VSM simulation domains (2604.12421). The design demonstrably bridges the gap between classical expert-driven workflows and automated, scalable simulation analysis, yielding SOTA performance with top-tier LLMs and opening a credible path for full agentic simulation optimization. Key challenges remain in scaling robustness to lower-resource LLMs and reducing operational cost, motivating future research into robust agentic orchestration, advanced tool-use, and VSM-specific prompt engineering for industrial AI.

Markdown Report Issue