- The paper introduces a GS-based framework that leverages scene dynamics to disentangle illumination from material properties in dynamic scenes.
- It employs a two-stage pipeline: first modeling dynamic geometry with Gaussian splatting, then optimizing material parameters with physically-based rendering.
- Extensive evaluations demonstrate improved albedo and relighting performance with up to 23% lower LPIPS in albedo estimation compared to static baselines.
LumiMotion: Disentangling Illumination and Material in Dynamic Scenes via Gaussian Splatting
Introduction
LumiMotion addresses the problem of inverse rendering—decomposing appearance into geometry, material, and illumination—using dynamic scene content as a supervisory signal. Prior approaches using Gaussian Splatting (GS) or Neural Radiance Fields (NeRF) generally operate on static scenes, where disentangling lighting effects (e.g., shadows) from intrinsic surface properties is highly ill-posed. By exploiting scene dynamics, LumiMotion leverages the motion of objects as natural supervision: as a surface or object moves, it is observed under varying lighting contexts, providing critical cues for the inverse rendering problem.
Crucially, LumiMotion is positioned as the first GS-based framework that performs full inverse rendering on arbitrary, object-agnostic, dynamic scenes, and optimizes under unknown and unstructured illumination, without any requirement for controlled or multi-light datasets. The method further introduces a novel synthetic benchmark with static and dynamic scene variants under controlled lighting, enabling systematic quantitative evaluation.
Methodology
LumiMotion's pipeline consists of two distinct stages. In Stage 1, the method learns a dynamic 2D Gaussian Splatting representation that models both static and dynamic scene elements. In Stage 2, this geometry, along with the learned dynamics, is fixed, and optimization is performed over material parameters (albedo, roughness) and a global illumination map, using a physically-based differentiable renderer.
Figure 1: Stage 1 optimizes dynamic 2D Gaussian representations; Stage 2 fixes geometry and decomposes albedo, roughness, and illumination with ray tracing and stratified sampling.
Stage 1: Dynamic Geometry and Motion-Aware Separation
The 2D Gaussian Splatting baseline is extended to account for temporal deformation via an MLP, which predicts time- and space-conditioned updates to Gaussian position, orientation, and color. A critical innovation is the introduction of a differentiable static-dynamic assignment variable for each Gaussian, modeled with a Binary Concrete distribution. This allows the network to explicitly separate static structure from dynamic content, which is imperative for preventing relighting artifacts (e.g., “baked-in” shadows encoded as material instead of illumination). Regularization losses encourage minimal, interpretable separation.
Figure 2: Across time, LumiMotion produces consistent normals, invariant static element segregation, and temporally-stable relighting in dynamic scenes.
A multiplicative formulation for modeling time-varying Gaussian color change is adopted, which effectively captures shadowing and incident light variations due to object motion.
Stage 2: Inverse Rendering with Physically-Based Decomposition
In the second stage, the network infers per-Gaussian albedo and roughness, initialized from the canonical colors learned in Stage 1, and jointly optimizes the unknown environment map. The rendering equation is applied in a deferred fashion: albedo, normals, and roughness are rasterized into buffers, and shading, visibility, and indirect illumination are computed via stratified Monte Carlo ray tracing. Crucially, only the scene dynamics from Stage 1 provide the required supervision cues for separating lighting from material. Losses supervise rendered output against ground truth, while regularization ensures adherence to plausible physical reflectance.
The environment illumination is predicted as a dense map, without any category priors. Indirect light is handled similarly to [irgs], ensuring both direct and secondary illumination effects are accurately captured.
Figure 3: LumiMotion outperforms baselines in novel view synthesis, relighting, intrinsic decomposition, and illumination map estimation.
Quantitative and Qualitative Evaluation
LumiMotion is evaluated on both the introduced synthetic benchmark (20 dynamic/static scenes under four HDR environments) and real multiview data. Metrics reported include PSNR, SSIM, and LPIPS for albedo and relighting. Against state-of-the-art baselines (e.g., R-3DGS, GI-GS, IRGS), LumiMotion demonstrates 23% lower LPIPS in albedo estimation and 15% lower LPIPS in relighting.
Figure 4: Qualitative comparison on ENeRF real-world scenes: LumiMotion accurately removes shadows from albedo, unlike IRGS.
LumiMotion’s improvements are strongly visible in scenarios with challenging direct or spatially-complex illumination. The method prevents the “baking” of shadows in albedo, shows accurate environment map directionality, and achieves high fidelity relighting, even in the presence of complex human motion and occlusion. Evaluation on ENeRF scenes validates that the method generalizes to real content and articulates specular and shadow detail more reliably than static-only methods.
Figure 5: On real ENeRF data, LumiMotion removes dynamic shadows from albedo and achieves compelling, physically-plausible relighting.
Further, ablation studies show that omission of static-dynamic separation or temporal color modeling leads to significant degradation in relighting and material estimation, underlining the necessity of explicit motion-aware decomposition.
Analysis of Static-Dynamic Separation
Hyperparameter sweeps show that early and sufficiently penalized separation prevents misclassification of stationary shadows as dynamic or static Gaussians, a failure mode in baselines and poorly regularized variants. Ablations demonstrate that when enforced, the strategy ensures that only genuinely dynamic geometry receives temporal updates, resulting in more robust relighting and faithful albedo extraction.
Figure 6: Without separation, moving shadows are modeled as dynamic geometry, baking them into albedo; with separation enabled, shadows contribute only as color change, preventing artifacts.
Dataset Contributions and Material Recovery
The synthetic benchmark provides systematic, controllable variation in dynamics and light, enabling fair comparison unavailable in previous datasets. In material estimation (notably roughness), LumiMotion recovers detail missed by all baselines, indicating successful reflectance-illumination disentanglement.
Figure 7: Baseline methods struggle to recover roughness; LumiMotion produces accurate, detailed roughness maps, aligning with ground truth.
Qualitative results in both synthetic and real scenes show consistent performance and robustness against overfitting to training illumination, a common problem for static-only models.
Figure 8: Dynamic-scene training enables cleaner albedo, fine detail recovery, and more accurate environment maps than static baselines.
Limitations and Future Directions
LumiMotion’s fidelity is highly sensitive to the accurate estimation of scene normals and motion deformation. Challenging motion patterns, sparse camera coverage, or imprecise pose initialization can lead to artifacts or misclassification at the static-dynamic boundary. The method’s separation strategy is relatively simple—misclassifications near contact regions or fine-grained dynamic motion suggest that further advances (e.g., explicit optical flow, joint motion/material learning) may be required for the next leap in accuracy.
Conclusion
LumiMotion establishes that dynamic scene analysis provides critical supervision enabling accurate, physically-motivated inverse rendering in complex environments. By leveraging the natural variability of motion, the method achieves marked improvements in albedo, roughness, and environment illumination separation over all static-scene baselines, facilitating high-quality relighting and material recovery under unknown real-world illumination. The approach sets a foundation for object-agnostic, photometric scene understanding and opens avenues for advanced editing, relighting, and AR/VR content creation on in-the-wild dynamic data.