Uncertainty-quantified Pulse Signal Recovery from Facial Video using Regularized Stochastic Interpolants
Published 12 Apr 2026 in cs.CV | (2604.10777v1)
Abstract: Imaging Photoplethysmography (iPPG), an optical procedure which recovers a human's blood volume pulse (BVP) waveform using pixel readout from a camera, is an exciting research field with many researchers performing clinical studies of iPPG algorithms. While current algorithms to solve the iPPG task have shown outstanding performance on benchmark datasets, no state-of-the art algorithms, to the best of our knowledge, performs test-time sampling of solution space, precluding an uncertainty analysis that is critical for clinical applications. We address this deficiency though a new paradigm named Regularized Interpolants with Stochasticity for iPPG (RIS-iPPG). Modeling iPPG recovery as an inverse problem, we build probability paths that evolve the camera pixel distribution to the ground-truth signal distribution by predicting the instantaneous flow and score vectors of a time-dependent stochastic process; and at test-time, we sample the posterior distribution of the correct BVP waveform given the camera pixel intensity measurements by solving a stochastic differential equation. Given that physiological changes are slowly varying, we show that iPPG recovery can be improved through regularization that maximizes the correlation between the residual flow vector predictions of two adjacent time windows. Experimental results on three datasets show that RIS-iPPG provides superior reconstruction quality and uncertainty estimates of the reconstruction, a critical tool for the widespread adoption of iPPG algorithms in clinical and consumer settings.
The paper introduces RIS-iPPG, a method that quantifies uncertainty in pulse signals from facial video using regularized stochastic interpolants.
It employs a Residual Correlation Loss for enforcing temporal consistency, achieving sub-2 bpm error on benchmark datasets.
Experimental results demonstrate improved heart rate prediction and calibrated uncertainty, addressing the clinical need for transparent iPPG methods.
Uncertainty-Quantified Pulse Signal Recovery from Facial Video using Regularized Stochastic Interpolants
Introduction
This paper addresses the critical problem of enabling uncertainty-quantified recovery of pulse (blood volume) waveforms from facial video, a domain broadly known as imaging photoplethysmography (iPPG). While recent iPPG methods have used deep learning for translating noisy camera pixel signals into blood volume pulse waveforms, these approaches have not provided uncertainty information to end-users. This global lack of sample-level predictive uncertainty hinders clinical adoption, where reliability and transparency are essential.
The primary methodological contribution is RIS-iPPG: a framework leveraging regularized stochastic interpolant diffusion models for iPPG signal recovery and credible uncertainty quantification. RIS-iPPG learns flow and score vector fields spanning the transformation from camera measurements to ground-truth physiological signals, then employs stochastic differential equation (SDE) sampling at test time to generate posterior samples and perform uncertainty analysis. A temporal regularization strategy, the Residual Correlation Loss (RCL), enforces temporal consistency of the flow model, a critical property in physiological time series.
Methodology
RIS-iPPG formulates the video-to-pulse recovery task as a distributional inverse mapping. Given preprocessed camera region signals and paired ground-truth BVP waveforms, the model learns the SDE drift field that transports the empirically observed camera signal distribution to the BVP distribution. This is accomplished by parameterizing the instantaneous flow and score at all points along the stochastic interpolation path, adhering to the framework of [albergo2023stochasticinterpolantsunifyingframework].
Figure 1: The pipeline extracts signal estimates, learns score/flow during training, and performs SDE-based posterior sampling and uncertainty quantification at test time.
Formally, for signals x1 (camera) and x0 (BVP), the stochastic interpolant is
xt=(1−t)x0+tx1+2t(1−t)z,
with neural networks predicting the flow vθ and the score (denoiser) nθ for matched time-shifted window pairs.
The Residual Correlation Loss is key: Instead of regularizing raw predicted flows, the model computes the residual (prediction-target difference) for adjacent, overlapping windows and maximizes the alignment (correlation) between the residuals. This induces a structural prior reflecting the slow-varying dynamics of physiological signals across time, which improves temporal consistency and robustness to measurement noise.
At inference, the learned drift field defines the reverse-time SDE to transport the measurement x1 to the reconstructed x0. Repeated SDE sampling enables posterior distribution visualization and sample-level uncertainty analysis.
Experimental Results
Quantitative Evaluation
Evaluations span three benchmark datasets (MMSE-HR, PURE, UBFC-rPPG), using mean absolute error (MAE), RMSE, and Pearson’s ρ for pulse rate and full spectrum estimation. RIS-iPPG is competitive with or outperforms established baselines, especially on the PURE dataset, and achieves sub-2 bpm error on MMSE-HR and <1 bpm on UBFC-rPPG.
Bland-Altman plots confirm mean error close to zero and tight confidence intervals, indicative of reliable predictions across all datasets.
Figure 2: Bland-Altman plots indicate negligible mean difference and reasonable confidence intervals for RIS-iPPG across diverse datasets.
Qualitative and Uncertainty Analysis
Power spectrum plots under sampling demonstrate that the RCL-regularized model accurately identifies the true heart rate and its modes, with credible intervals reflecting plausible alternative frequencies only where justified by data ambiguity.
Figure 3: RCL regularization yields accurate mode prediction and calibrated uncertainty intervals in power spectral estimates.
Detailed calibration analysis shows small miscalibration error (≤0.1) for both whole datasets and subgroups (skin tone, gender). Proper scoring rules—negative log likelihood, CRPS, check/interval scores—demonstrate improved uncertainty quantification with RCL.
Figure 4: Calibration curves stratified by skin tone show generally good alignment between predicted and empirical coverage.
RIS-iPPG establishes, for the first time, baselines for uncertainty quantification on protected subpopulations in iPPG, confirming persistent accuracy disparities for dark skin tones, consistent with prior literature.
Temporal Consistency and Ablation
Grid search on the temporal regularization weight and window overlap demonstrates optimal performance for nontrivial overlap and low-to-moderate RCL weighting, validating the benefit of residual-based correlation regularization for flow fields.
Training and validation convergence curves for both flow and RCL losses demonstrate monotonic decrease and support stable SDE solution quality.
Comparative Modeling Analysis
RIS-iPPG is benchmarked against posterior sampling methodologies including conditional VAEs and Bayesian neural networks, both of which exhibit inferior heart rate estimation error and sample sharpness in this task. The diffusion SDE framework models complex, non-Gaussian, multimodal relationships and provides better uncertainty calibration and predictive performance.
Figure 5: Incorporating RCL on error residuals yields stable and rapidly declining validation loss, while naïve flow alignment stalls.
Furthermore, the system satisfies bounded-input, bounded-output (BIBO) stability due to the Lipschitz-continuous drift, with total sampling error tightly controlled by the neural approximation quality and the SDE discretization step.
Theoretical and Practical Implications
This work substantiates the viability of SDE-driven, uncertainty-quantified inverse models for physiological signal recovery under realistic, heavily corrupted observation conditions. The integration of sample-level uncertainty is directly responsive to clinical requirements for transparent, trustworthy prediction interfaces.
RIS-iPPG's regularization formulation provides an inductive bias toward physiologically coherent solutions, exploiting the quasi-stationary nature of cardiac activity and enabling improved generalization to unseen data windows.
Nonetheless, within-domain performance is excellent but notable generalization errors are encountered in cross-domain transfer—highlighting the need for future research on domain adaptation for iPPG models.
Future Directions
Open research directions include (1) extending RIS-iPPG to robustly generalize under domain shift (e.g., via domain-invariant representation learning or meta-adaptation), (2) scaling the regularized SDE paradigm to handle variable-length, multi-lead physiological measurements, (3) incorporating additional covariate information (e.g., skin reflectance priors, contextual metadata), and (4) accelerating SDE solvers for real-time deployment. Furthermore, discriminative calibration on protected attributes warrants model- and data-centric solutions for bias reduction.
Conclusion
RIS-iPPG represents a significant advance in the integration of calibrated uncertainty quantification into non-contact vital sign estimation from facial video. By leveraging regularized stochastic interpolant diffusion models and enforcing temporal consistency, the approach yields accurate, trustworthy sample-level inference with implications for clinical decision support and high-reliability consumer health systems. It establishes comprehensive performance and calibration baselines and points to important future work centered on domain robustness and subgroup fairness.