PAC-Bayes Bounds for Gibbs Posteriors via Singular Learning Theory

Published 19 Apr 2026 in stat.ML and cs.LG | (2604.17219v1)

Abstract: We derive explicit non-asymptotic PAC-Bayes generalization bounds for Gibbs posteriors, that is, data-dependent distributions over model parameters obtained by exponentially tilting a prior with the empirical risk. Unlike classical worst-case complexity bounds based on uniform laws of large numbers, which require explicit control of the model space in terms of metric entropy (integrals), our analysis yields posterior-averaged risk bounds that can be applied to overparameterized models and adapt to the data structure and the intrinsic model complexity. The bound involves a marginal-type integral over the parameter space, which we analyze using tools from singular learning theory to obtain explicit and practically meaningful characterizations of the posterior risk. Applications to low-rank matrix completion and ReLU neural network regression and classification show that the resulting bounds are analytically tractable and substantially tighter than classical complexity-based bounds. Our results highlight the potential of PAC-Bayes analysis for precise finite-sample generalization guarantees in modern overparameterized and singular models.

Abstract PDF Upgrade to Chat

Authors (2)

Summary

The paper presents a novel integration of PAC-Bayes analysis with singular learning theory to derive explicit finite-sample bounds.
It leverages the real log canonical threshold (RLCT) to adaptively measure model complexity beyond traditional parameter counting.
The framework offers practical generalization guarantees for overparameterized models such as low-rank matrix completion and deep ReLU networks.

PAC-Bayes Generalization Bounds for Gibbs Posteriors in Singular Models

Motivation and Context

The paper "PAC-Bayes Bounds for Gibbs Posteriors via Singular Learning Theory" (2604.17219) addresses the persistent challenge of deriving finite-sample generalization guarantees for modern overparameterized and singular models, where traditional uniform-convergence-based bounds fail to give tight or non-vacuous risk estimates. Classical complexity measures, such as VC dimension and covering entropy, do not align with the empirical generalization of deep learning models, particularly in regimes with parameter dimension far exceeding the sample size.

A key methodological advance is the integration of PAC-Bayes analysis—yielding bounds for posterior-averaged risk—into the singular learning theory framework. Instead of relying on global worst-case bounds, PAC-Bayes bounds depend on averaged risks over data-dependent distributions (notably Gibbs posteriors), which are amenable to explicit analysis even in singular, non-identifiable, or overparametrized models.

Gibbs Posterior and PAC-Bayes Analysis

The Gibbs posterior replaces the negative log-likelihood with an arbitrary loss function, exponentially tilting a prior towards parameters minimizing empirical risk. This loss-based update provides robustness to model misspecification and forgoes the dependence on a probabilistic model structure, enabling application to ERM setups beyond likelihood-based models.

The PAC-Bayes framework then produces non-asymptotic bounds for the posterior-averaged population risk by trading off empirical fit with a complexity penalty given by the KL divergence between the posterior and prior. The core innovation is to combine these risk and penalty contributions into a single marginal-type integral over parameter space, closely related to the partition function in statistical mechanics and Bayesian evidence. Analytically, this replaces covering entropy with intrinsic integration geometry and facilitates explicit, sharp bounds.

Singular Learning Theory and RLCT

Singular learning theory provides tools from algebraic geometry to analyze the marginal likelihood in non-regular models. In regular models, the classical Laplace approximation yields a penalty $d/2$ (with $d$ the parameter dimension); in singular models, the leading penalty term is given by the real log canonical threshold (RLCT) $\lambda$ , a quantity reflecting the intrinsic geometry of the loss landscape near the minima, and which can be much smaller than $d/2$ or fractional. This adaptivity is crucial for modern deep architectures, mixture models, and other non-identifiable settings.

Most singular learning theory results are asymptotic, treating increasing sample regimes and relying on strong regularity and entropy control. The present work develops a non-asymptotic framework, leveraging a Bernstein-type sub-exponential moment assumption on the loss to directly relate empirical and population risks, thus avoiding explicit entropy arguments. This enables a two-sided PAC-Bayes control—bounding both empirical by population risk and vice versa—leading to explicit finite-sample risk bounds for Gibbs posteriors in terms of the RLCT.

Main Theoretical Results

The principal result is an explicit high-probability PAC-Bayes bound stating that, under the Bernstein-type moment condition, for any $\delta\in(0,1)$ and sufficient learning rate $\omega$ , with probability at least $1-\delta$ ,

$\int_\Theta R(\theta, \theta^*)\ \Pi_n(d\theta) \le \frac{2}{\big(1-\frac{\omega L}{2}\big)\omega n} \left[ -\log \int_\Theta \exp\{-c\,n\,R(\theta, \theta^*)\}\,\varphi(d\theta) + \log\frac{2}{\delta} \right],$

where $c$ depends on $\omega, L$ and the loss, and $d$ 0 is the prior. This marginal-type integral is then analyzed via singular learning theory, yielding

$d$ 1

with $d$ 2 the RLCT and $d$ 3 its multiplicity. In practice, bounds are substantially tighter than those based on parameter dimension, especially in overparameterized and singular models.

Key claims:

The RLCT $d$ 4 encapsulates complexity adaptively based on local geometry, rather than global parameter count.
The derived bounds are explicit, analytically tractable, and remain tight even when classical covering-based approaches fail.
Posterior averaging in Gibbs inference induces an Occam's razor effect, penalizing models only for the size of low-risk regions, not the overall space.

Applications: Low-Rank Matrix Completion and Deep Neural Nets

Matrix Completion

For low-rank matrix completion, the analysis yields a PAC-Bayes risk bound in terms of the Frobenius norm error, with RLCT expressions depending intricately on the true matrix rank $d$ 5, ambient dimensions $d$ 6, and latent dimension $d$ 7. The RLCT is often much smaller than both the parameter dimension $d$ 8 and ambient size $d$ 9—especially when the rank is low or the matrix is highly unbalanced—reflecting the redundancy and non-identifiability of parameters in practical regimes.

The bound admits explicit forms:

$\lambda$ 0

This result formalizes the reduction in statistical complexity, and demonstrates how the bounds adapt automatically to the intrinsic rank structure.

ReLU Neural Networks

For deep ReLU networks in regression and classification (with squared and logistic loss), the bound is determined by the minimal architecture needed to realize the true regression or logit function, not by the width or depth of the actual network used. Specifically, the RLCT is upper-bounded by half the parameter count of the minimal network, regardless of the size and overparameterization. These results clarify why generalization occurs in deep models exceeding the classical sample complexity, and provide explicit bounds for practitioners.

Implications and Future Directions

Practically, the results offer rigorous, finite-sample generalization guarantees for highly overparameterized models, matrix completion, and networks exhibiting singularities. Theoretically, this framework reframes statistical complexity, prioritizing intrinsic local geometry and RLCT over parameter counting. These advancements facilitate principled uncertainty quantification and model selection, suggesting avenues for adapting Bayesian information criteria (BIC) to singular models ("singular BIC") and for direct empirical estimation of RLCT.

The approach is extensible to a variety of loss functions, noise structures, and model classes—potentially including deep generative models, structured prediction, and functional spaces. Further research may refine RLCT estimation, extend non-asymptotic bounds to stochastic optimization (e.g., SGD), and integrate fractal or data-dependent dimension estimates [Dupuis et al., 2023].

Conclusion

This paper provides a technically rigorous, explicit framework for PAC-Bayes generalization bounds in modern singular models, leveraging the RLCT from singular learning theory. Derived bounds are tight, data-adaptive, and practical for low-rank, neural network, and overparameterized settings, highlighting both the theoretical and applied benefits of posterior averaging and intrinsic complexity quantification. The results are likely to influence model assessment, uncertainty quantification, and criteria for model selection in complex, high-dimensional learning environments.

Markdown Report Issue