Agentic Driving Coach: Robustness and Determinism of Agentic AI-Powered Human-in-the-Loop Cyber-Physical Systems

Published 13 Apr 2026 in cs.AI, cs.CL, cs.RO, and eess.SY | (2604.11705v1)

Abstract: Foundation models, including LLMs, are increasingly used for human-in-the-loop (HITL) cyber-physical systems (CPS) because foundation model-based AI agents can potentially interact with both the physical environments and human users. However, the unpredictable behavior of human users and AI agents, in addition to the dynamically changing physical environments, leads to uncontrollable nondeterminism. To address this urgent challenge of enabling agentic AI-powered HITL CPS, we propose a reactor-model-of-computation (MoC)-based approach, realized by the open-source Lingua Franca (LF) framework. We also carry out a concrete case study using the agentic driving coach as an application of HITL CPS. By evaluating the LF-based agentic HITL CPS, we identify practical challenges in reintroducing determinism into such agentic HITL CPS and present pathways to address them.

Abstract PDF Upgrade to Chat

Authors (3)

Summary

The paper demonstrates a reactor-based design for a driving coach in HITL CPS that achieves deterministic behavior despite inherent LLM and human nondeterminism.
It integrates foundation models using the Lingua Franca framework and enforces strict deadline guarantees to mitigate latency and reliability challenges.
Experimental evaluations across stop sign, speed change, and lane change scenarios validate improved safety and reduced interventions with larger LLM models.

Agentic Driving Coach: Determinism and Robustness in Agentic AI-Powered Human-in-the-Loop Cyber-Physical Systems

Introduction

The integration of agentic AI, specifically foundation models such as LLMs, into human-in-the-loop cyber-physical systems (HITL CPS) introduces significant nondeterminism due to the combined unpredictability of human operators, the physical environment, and the probabilistic nature of state-of-the-art models. The paper "Agentic Driving Coach: Robustness and Determinism of Agentic AI-Powered Human-in-the-Loop Cyber-Physical Systems" (2604.11705) systematically confronts the challenge of instantiating deterministic, robust, and analyzable agentic HITL CPS using a reactor-based model of computation. The main focus is a model-based design for an agentic driving coach, utilizing the Lingua Franca (LF) framework to guarantee deterministic system-level behaviors despite the nondeterminism injected by its subsystems.

Motivation and Challenges

Foundation models have delivered substantial improvements in autonomy, contextual reasoning, and adaptability for HITL CPS. However, their inherent lack of timing determinism and the occurrence of inference errors (e.g., hallucinations) introduce failure modes that undermine safety guarantees and repeatability. Ensuring deterministic behavior is a primary concern for certifying CPS, enabling repeatable validation, compositional reasoning, and supporting fault detection and mitigation.

Key challenges identified and targeted by the authors include:

Inference Latency Variability: LLM inference introduces unpredictable delays, which can cause instructions to be delivered too late for safe execution.
Instruction Quality Variability: LLMs may produce incorrect or suboptimal driving instructions due to hallucinations or lack of context.
Human Behavioral Variation: The human-in-the-loop operator introduces additional dynamism.
Physical Environment Uncertainties: The plant and environment components are inherently nondeterministic.
Figure 1: The principal hazards in agentic HITL CPS—delayed or incorrect instructions from the agent—are direct consequences of LLM and human nondeterminism.

System Architecture and Reactor Model

The proposed approach advocates explicit system-level modeling using a reactor-oriented paradigm, as realized with Lingua Franca. The architecture decomposes the system into modular "reactors"—atomic actors with deterministic scheduling, isolated logical state, and communication strictly via directed ports.

Component Modeling

Agentic Coach (Composite Reactor): Encapsulates two sub-reactors, LLMInference and Planner. The coach observes sensor data, invokes LLM-based inference under deadline guarantees, and issues driving instructions and supervisory actuation signals.
Driver Reactor: Abstracts the human's behavioral and perceptual feedback loop, processing instructions with modeled actuation and perception latency.
Car Reactor: Models the physical state evolution (e.g., velocity, steering) of the vehicle using discrete control-theoretic equations, receiving both human and agentic inputs.
Environment Reactor: Provides the environmental context as both external observation and input to system modeling.
Figure 2: Modular HITL CPS topology with distinct separation of the coaching agent, human driver, and physical plant, enabling deterministic orchestration via reactors.

Figure 3: Implementation of the agentic driving coach in Lingua Franca, detailing the reactor composition and key inter-component dataflow.

A critical feature is that reactors orchestrate their reactions according to logically deterministic schedules, enforcing unique system outputs for any fixed sequence of input events (driver actions and LLM responses).

Timing and Deadline Handling

The reactor model operationalizes deadlines as explicit handlers within reactions, applying exception-like semantics. This is instantiated with per-model empirical deadline values (e.g., 186 ms for 1B, 613 ms for 70B parameters Llama 3), with fallback behaviors (emergency interventions) on violation, ensuring safe system-level responses despite LLM variability. Logical delays are also systematically injected to model real perception-action and actuation latencies.

Figure 4: The Planner reactor operates as a modal finite-state machine, triggered by control signal outputs from LLMInference, enabling explicit transitions for monitoring, warning, and agentic actuation modes.

Prompt Engineering and Control Generation

To constrain LLM output nondeterminism and reduce miscommunication risk, the authors design structured, context-rich prompts embedded with both scenario-specific constraints and a lightweight signaled protocol (e.g., "NONE", "WARNING", "ACTUATE"), minimizing verbosity and ambiguity.

Figure 5: Example of structured and context-embedded prompts used to elicit high-quality and protocol-compliant LLM responses in the stop sign scenario.

By systematically passing real-time environment, vehicle, and driver state as structured input, the LLM-in-the-loop has enhanced context-awareness. Randomness is minimized via parameter control ("temperature": 0).

Evaluation Scenarios and Determinism Validation

The authors employ three canonical driving scenarios: stop sign, speed limit change, and lane change, with fixed driver behaviors to isolate the system's response determinism.

Figure 6: The three controlled evaluation scenarios—stop sign approach, speed change, and lane change—used for model validation.

Testing is conducted with quantized Llama 3 models (1B, 8B, 70B) to probe the trade-offs between inference speed, instruction quality, and deadline adherence. The deadline bound for each model is set to its measured worst-case latency.

Results

Figure 7: Model-level comparison in the stop sign scenario. Llama 1B fails to produce safe instructions irrespective of timely delivery, while Llama 8B and 70B provide safer guidance, albeit with some deadline misses for 70B due to longer inference latency.

For a fixed input sequence, the Lingua Franca reactor execution is deterministic, yielding identical instruction, actuation, velocity, and deadline violation timelines—demonstrating the efficacy of the reactor model for system-level determinacy. Smaller LLM models (1B) are empirically unsafe due to poor instruction generation, whereas larger models (8B, 70B) substantially improve outcome quality, with 70B delivering safer, less interventionist behavior.

Figure 8: Quantitative performance comparison of Llama 8B vs. 70B in speed and lane change tasks. Both exhibit deterministic handling of timely and deadline-missed instructions. 70B achieves the desired system behavior with fewer interventions and better safety.

Deadline misses are rare but non-negligible, especially for larger models or scenarios requiring richer context (e.g., lane change needing head position monitoring). The system's fallback logic is activated as intended, providing safety envelopes despite agentic nondeterminism.

Implications and Future Directions

The reactor-based modeling approach, as shown, achieves deterministic, analyzable system behaviors for HITL CPS with agentic AI components, supporting robust evaluation and composability. This methodology advances the CPS verification landscape by enabling:

Repeatable safety assessments: Determinism assures that faults and rare events can be systematically reproduced and analyzed.
Modular certification paths: Composable, analyzable subsystems facilitate paths towards safety-critical certification.
Resilience against LLM and human nondeterminism: System-level deadlines and fallback behaviors decouple system safety from inner agent unreliability.

Potential extensions include incorporation of dynamic real-world sensor integration, online fine-tuning or RAG/LORA-based context enrichment to reduce latency for large LLMs, and expanding to broader scenario coverage. The theoretical enforcement of determinism in the presence of black-box agentic submodules has direct ramifications for autonomous, safety-critical CPS deployed at scale.

Conclusion

The presented work establishes a comprehensive framework for integrating foundation model-driven agentic AI into HITL CPS while guaranteeing determinism and robustness at the system level. By leveraging a reactor-based model of computation and Lingua Franca, the authors demonstrate that it is possible to construct complex agentic CPS with repeatable behaviors, effective safety fallback strategies, and clear, analyzable interaction patterns among human, AI, and environment subsystems. This methodology supports future developments toward safe, certifiable, agentic autonomy for critical domains including transportation and beyond.

Markdown Report Issue