Steering through Time: Blending Longitudinal Data with Simulation to Rethink Human-Autonomous Vehicle Interaction

Published 1 Apr 2026 in cs.HC | (2604.00832v1)

Abstract: As semi-automated vehicles (SAVs) become more common, ensuring effective human-vehicle interaction during control handovers remains a critical safety challenge. Existing studies often rely on single-session simulator experiments or naturalistic driving datasets, which often lack temporal context on drivers' cognitive and physiological states before takeover events. This study introduces a hybrid framework combining longitudinal mobile sensing with high-fidelity driving simulation to examine driver readiness in semi-automated contexts. In a pilot study with 38 participants, we collected 7 days of wearable physiological data and daily surveys on stress, arousal, valence, and sleep quality, followed by an in-lab simulation with scripted takeover events under varying secondary task conditions. Multimodal sensing, including eye tracking, fNIRS, and physiological measures, captured real-time responses. Preliminary analysis shows the framework's feasibility and individual variability in baseline and in-task measures; for example, fixation duration and takeover control time differed by task type, and RMSSD showed high inter-individual stability. This proof-of-concept supports the development of personalized, context-aware driver monitoring by linking temporally layered data with real-time performance.

Abstract PDF Upgrade to Chat

Authors (7)

Summary

The paper presents a novel hybrid framework that blends longitudinal wearable sensing with controlled driving simulation to quantify baseline and event-driven driver states.
It leverages multi-modal measures including fNIRS, eye tracking, and self-reported stress to assess cognitive workload and takeover readiness in semi-automated vehicles.
Key findings reveal high trait stability in sleep-derived RMSSD and significant effects of conversational distraction on gaze metrics and control times.

Integrative Longitudinal and Simulation-Based Sensing for Human-Autonomous Vehicle Interaction

Introduction

"Steering through Time: Blending Longitudinal Data with Simulation to Rethink Human-Autonomous Vehicle Interaction" (2604.00832) addresses a core limitation in driver monitoring systems (DMS) for semi-automated vehicles (SAV): the underrepresentation of temporally layered psychophysiological state information. The proposed framework leverages a combination of multi-day mobile physiological sensing, daily ecological momentary assessments, and tightly controlled driving simulation, facilitating the capture of baseline and acute state fluctuations that modulate takeover readiness. The authors implement a within-subject design to decompose task and event influences on gaze, cortical, and behavioral indices of driver state using a multimodal sensor suite.

Hybrid Framework: Design and Data Flow

The methodology involves a two-phase protocol. Initially, participants undergo a seven-day longitudinal monitoring period using wearables (Empatica EmbracePlus) and daily experience sampling to capture temporal dynamics in autonomic and affective indices. The in-lab simulation phase then presents controlled takeover events (unexpected pedestrian vs. static crash) under variable secondary task conditions (no task, 2-back, conversation), during which real-time measures include eye tracking, fNIRS-based prefrontal activation, and standard driving performance metrics.

Figure 1: Overview of the longitudinal data collection process, integrating mobile sensing and daily surveys before simulator exposure.

This architecture enables linkage between individual baseline variability and performance under cognitive load or distraction, aiming to support future context-aware DMS.

Multimodal Sensing and Preprocessing

Comprehensive sensing spans several modalities:

Wearable physiological sensors: EDA, PPG-derived HRV (specifically RMSSD during sleep), accelerometry, and skin temperature.
Self-reports: Stress (NRS-11), valence/arousal scales, workload, sleep quality, captured using ecological momentary assessment apps.
fNIRS imaging: Four-source, twelve-detector array over PFC for high-frequency hemodynamic monitoring, supporting robust motion/noise artifact rejection, baseline correction, and HbO/HbR quantification.
Figure 2: Placement of fNIRS sources and detectors for optimal prefrontal cortex coverage.
Eye tracking: Binocular gaze data with fixation detection (I-DT algorithm) and fixation-duration as proxy for cognitive demand.
Simulator streams and video: Detailed control and environmental context.
Figure 3: Simulator laboratory with sensing equipment, example gaze data, and in-cabin video synchronization.

Signal preprocessing (for fNIRS, illustrated in Figure 4) implements SCI-based quality gating, high/low-pass filtering, motion artifact correction (spline, wavelet denoising), and physiological noise reduction.

Figure 4: Schematic of HbO signal extraction pipeline from raw optical intensities.

Longitudinal State Characterization

The primary focus is isolating temporal stability and inter-individual differences in baseline physiological and psychological states preceding simulator exposure. The authors aggregate week-long data to participant-level means for indices such as stress, valence, arousal, subjective sleep quality, and RMSSD. Notably, RMSSD exhibited high inter-individual stability (ICC = 0.817), outperforming emotional and self-reported metrics in trait-like consistency.

Figure 5: Per-participant means for stress, valence, and arousal, indicating broad interindividual variation.

Figure 6: Distribution and stability of sleep RMSSD and subjective sleep quality across individuals.

These results underscore the necessity of incorporating temporally sensitive and individualized baselines in modeling driver readiness, as opposed to static group-level assumptions.

Event-Level Neural and Behavioral Response Analysis

For in-lab sessions, task-dependent modulation is observable across neurophysiological and gaze metrics:

fNIRS: HbO amplitude in PFC channels is acutely elevated during challenging n-back intervals, with conversation tasks producing qualitatively smaller but clear increases above passive/no-task baselines.
Figure 7: Representative hemodynamic response in a participant, demarcating transitions between secondary tasks and driving states.
Gaze metrics: Fixation duration is reliably shortest during conversation (p < 0.01 for both pedestrian and crash events versus other tasks) – indicating more fragmented attention – with the n-back condition yielding extended fixations suggestive of increased cognitive workload.
Figure 8: Mean fixation durations stratified by secondary task and takeover event, with conversation tasks yielding the shortest fixations.
Takeover control time (TOC): Behavioral latency peaks under conversational distraction in dynamic (unexpected pedestrian) scenarios, highlighting strong cross-modal effects of secondary tasks and event salience on readiness.
Figure 9: TOC distributions across all experimental conditions, showing maximal delays in the conversation-pedestrian event co-occurrence.

These findings align with prior literature where naturalistic secondary task engagement degrades visual attention and increases reaction time [strayer2007cell, ZEEB2016, zhang2024non].

Implications and Limitations

The core finding is that baseline interindividual differences (particularly in sleep-derived RMSSD) and context-dependent state (modulated by ongoing cognitive tasks and event salience) both strongly influence operator readiness and attention profiles. The feasibility of high-throughput, high-fidelity integration of mobile sensing with simulation offers a pathway to individualized, context-aware DMS algorithms; fixed within-population thresholds are intrinsically brittle.

Practically, this paradigm motivates deploying online learning and continual context adaptation in AI-driven DMS architectures, leveraging temporally dense physiological and self-report data for dynamic risk assessment. Theoretically, it substantiates the multi-scale, hierarchical nature of cognitive readiness: slow-varying individual traits modulate fast, task-dependent neural and perceptual responses during critical transitions.

Limiting factors include sample size (N=38, skewed towards university students), non-counterbalanced event order, and incomplete sensor coverage due to adherence challenges, which preclude conclusive outcome modeling. Gaze analysis focused solely on fixation duration, omitting saccade, blink, and pupil indices.

Future Directions

Immediate research should pursue:

Larger-scale, demographically diverse sampling for external validity.
Enhanced multi-metric gaze and multimodal feature fusion.
Dynamic scenario and event randomization to resolve order/carryover effects.
Inclusion in VR-based, distributed simulation environments to improve ecological validity and scalability.
Translation of these findings toward real-time, on-road DMS deployment with continuous online baselining and adaptive risk estimation.

Conclusion

The hybrid framework systematically bridges temporally layered real-world state information and in-simulator neurobehavioral response in SAV contexts (2604.00832). The evidence supports high interindividual trait stability (RMSSD), cross-modal sensitivity to conversational distraction, and the impact of event type on takeover readiness. This work charts explicit methodological and conceptual roadmaps for future multimodal, personalized driver monitoring, with implications for robust, adaptive automation handover policies and the development of next-generation AI-enabled DMS.

Markdown Report Issue