GART: Gaussian Articulated Template Models

Published 27 Nov 2023 in cs.CV and cs.GR | (2311.16099v1)

Abstract: We introduce Gaussian Articulated Template Model GART, an explicit, efficient, and expressive representation for non-rigid articulated subject capturing and rendering from monocular videos. GART utilizes a mixture of moving 3D Gaussians to explicitly approximate a deformable subject's geometry and appearance. It takes advantage of a categorical template model prior (SMPL, SMAL, etc.) with learnable forward skinning while further generalizing to more complex non-rigid deformations with novel latent bones. GART can be reconstructed via differentiable rendering from monocular videos in seconds or minutes and rendered in novel poses faster than 150fps.

Abstract PDF HTML Upgrade to Chat

Authors (5)

Citations (43)

View on Semantic Scholar

Summary

The paper introduces the GART model, which integrates moving 3D Gaussians with explicit template models to robustly capture the geometry and appearance of articulated subjects.
It employs learnable forward skinning and latent bones to accurately model non-rigid deformations, including complex motions like loose clothing.
GART achieves efficient monocular reconstruction and high-speed rendering (over 150 FPS), enabling practical applications for both human and animal modeling.

Introduction

Recent advancements in the field of rendering and capturing the motion of non-rigid, articulated objects—such as humans and animals—have centered around the development of detailed computational models. Traditional methods involve categorical template models which efficiently estimate poses but struggle to capture detailed appearances and complex deformations. On the other hand, implicit representations, although high in quality, are computationally intensive and slow to render. There is a need for models that combine the benefits of both explicit simplicity and the quality of implicit representations.

GART Model Framework

The Gaussian Articulated Template Models (GART) was proposed to address this gap. GART is based on a mixture of moving 3D Gaussians that explicitly approximate an object's geometry and appearance. By blending the idea of classical template models with Gaussian Mixture Models (GMM), GART creates a representation that mimics the underlying radiance field without the need for a fixed topology, allowing for a highly adaptable and robust modeling of articulated objects.

GART improves upon explicit representations by incorporating learnable forward skinning, a process typically used in template mesh animations. It also introduces a novel approach with latent bones, which helps in modeling complex non-rigid deformations such as the movement of loose clothing.

Reconstruction and Rendering Advantages

A striking advantage of GART is its efficiency in reconstruction from monocular videos and rendering in novel poses. The model can be reconstructed within seconds to minutes and can render at high speeds of over 150 FPS. This represents a significant improvement in training and inference efficiency over current state-of-the-art neural radiance-based methods for human rendering.

Moreover, GART includes various regularization techniques that enhance the quality of reconstruction when input data is sparse or poses are noisy. Prior information from category-level template models is utilized to accumulate information efficiently during monocular reconstruction.

Application and Extension

Beyond its core capabilities, GART serves as a versatile framework for reconstructing non-rigid articulated subjects beyond the human form. It has been successfully applied to capture the details of various dog breeds from monocular videos in the wild. Additionally, GART has been extended into applications such as Text-to-GART, where a textual description can be transformed into a 3D render, opening possibilities for user-friendly content creation.

Conclusion

GART stands out as a pioneering solution that bridges the divide between accessible, fast rendering and the high-quality capture of complex articulated subjects. It delivers state-of-the-art performance in monocular human and animal reconstruction and rendering, while maintaining unprecedented efficiency in training and real-time applications. With GART, the handling of real-world dynamic scenes has become significantly more practical and advanced.

Markdown Report Issue