GazeGaussian: High-Fidelity Gaze Redirection with 3D Gaussian Splatting

Published 20 Nov 2024 in cs.CV | (2411.12981v1)

Abstract: Gaze estimation encounters generalization challenges when dealing with out-of-distribution data. To address this problem, recent methods use neural radiance fields (NeRF) to generate augmented data. However, existing methods based on NeRF are computationally expensive and lack facial details. 3D Gaussian Splatting (3DGS) has become the prevailing representation of neural fields. While 3DGS has been extensively examined in head avatars, it faces challenges with accurate gaze control and generalization across different subjects. In this work, we propose GazeGaussian, a high-fidelity gaze redirection method that uses a two-stream 3DGS model to represent the face and eye regions separately. By leveraging the unstructured nature of 3DGS, we develop a novel eye representation for rigid eye rotation based on the target gaze direction. To enhance synthesis generalization across various subjects, we integrate an expression-conditional module to guide the neural renderer. Comprehensive experiments show that GazeGaussian outperforms existing methods in rendering speed, gaze redirection accuracy, and facial synthesis across multiple datasets. We also demonstrate that existing gaze estimation methods can leverage GazeGaussian to improve their generalization performance. The code will be available at: https://ucwxb.github.io/GazeGaussian/.

Abstract PDF HTML Upgrade to Chat

Authors (6)

Summary

The paper introduces a two-stream 3D Gaussian Splatting model that separates face and eye regions for high-fidelity gaze redirection.
It demonstrates state-of-the-art performance with 74 FPS rendering speed and the lowest angular errors in gaze and head redirection.
The approach enhances identity preservation and expression-dependent adjustments, promising real-time applications in telemedicine and human-computer interaction.

GazeGaussian: Advancing Gaze Redirection with 3D Gaussian Splatting

The paper "GazeGaussian: High-Fidelity Gaze Redirection with 3D Gaussian Splatting" introduces a novel approach to gaze redirection by leveraging advancements in 3D neural representations. The proposed method, GazeGaussian, applies a two-stream 3D Gaussian Splatting (3DGS) model, separating the representation of the face and eye regions, to achieve precise and realistic gaze manipulation. The introduction of these distinct models enables higher fidelity facial synthesis and improved generalization across different subjects.

Methodology and Implementation

GazeGaussian addresses limitations in conventional Neural Radiance Fields (NeRF) methods, which have been criticized for high computational costs and a lack of facial detail. The paper proposes an integration of 3DGS into gaze redirection tasks, marking an innovative transition from NeRF-based models. Utilizing an unstructured form of 3DGS, the authors develop a Gaussian Eye Rotation Representation (GERR) that effectively manages eye rotation necessary for accurate gaze direction.

The implementation of GazeGaussian begins with initializing a two-stream 3DGS model using a pre-trained neutral mesh from the training dataset, dividing it into face-only and eye regions. The face and eye deformations are then optimized using respective Gaussian models, ensuring that gaze redirection aligns with target gaze directions and head poses. Additionally, expression-conditional adjustments are incorporated, enhancing synthesis generalization by guiding the neural renderer with expression codes.

Experimental Evaluation

Comprehensive experiments were conducted to validate the effectiveness of GazeGaussian across multiple datasets, such as ETH-XGaze, ColumbiaGaze, MPIIFaceGaze, and GazeCapture. The results consistently demonstrate that GazeGaussian outperforms existing methodologies, especially in terms of rendering speed, gaze redirection precision, and image synthesis quality. For instance, it reported the lowest angular errors in gaze and head redirection tasks and achieved superior identity preservation metrics. Notably, the rendering speed of GazeGaussian reaches 74 FPS, highlighting its computational efficiency relative to previous methods.

Implications and Future Directions

The implications of GazeGaussian are significant for applications requiring precise gaze manipulation, such as virtual meetings, telemedicine, and human-computer interaction. By reducing computational overhead and improving synthesis fidelity, this method brings practicality to real-time systems and enhances user experience by maintaining natural facial expressions during gaze correction.

Theoretically, the introduction of a two-stream approach that separately models the eye and face presents a new paradigm in 3D neural representations for facial regions. Future research may explore extending this approach to capture other subtle facial dynamics, potentially integrating GazeGaussian with facial expression synthesis models to create more immersive virtual avatars.

Additionally, the success of GazeGaussian in enhancing the generalization of gaze estimators suggests its potential as a tool for data augmentation in machine learning pipelines, aiding in the development of more robust gaze estimation models. Future explorations could address integrating this framework with other modalities, such as audio-visual data, to further improve the contextual understanding of gaze behavior.

In conclusion, the GazeGaussian paper presents a substantial advancement in 3D gaze redirection, demonstrating the efficacy and potential of 3DGS models in overcoming existing limitations in gaze synthesis with enhanced speed and detail. Its contributions hold promise for a wide range of future developments in both practical applications and theoretical research within AI and computer vision domains.

Markdown Report Issue