- The paper presents a novel integration of PAC-Bayes analysis with singular learning theory to derive explicit finite-sample bounds.
- It leverages the real log canonical threshold (RLCT) to adaptively measure model complexity beyond traditional parameter counting.
- The framework offers practical generalization guarantees for overparameterized models such as low-rank matrix completion and deep ReLU networks.
PAC-Bayes Generalization Bounds for Gibbs Posteriors in Singular Models
Motivation and Context
The paper "PAC-Bayes Bounds for Gibbs Posteriors via Singular Learning Theory" (2604.17219) addresses the persistent challenge of deriving finite-sample generalization guarantees for modern overparameterized and singular models, where traditional uniform-convergence-based bounds fail to give tight or non-vacuous risk estimates. Classical complexity measures, such as VC dimension and covering entropy, do not align with the empirical generalization of deep learning models, particularly in regimes with parameter dimension far exceeding the sample size.
A key methodological advance is the integration of PAC-Bayes analysis—yielding bounds for posterior-averaged risk—into the singular learning theory framework. Instead of relying on global worst-case bounds, PAC-Bayes bounds depend on averaged risks over data-dependent distributions (notably Gibbs posteriors), which are amenable to explicit analysis even in singular, non-identifiable, or overparametrized models.
Gibbs Posterior and PAC-Bayes Analysis
The Gibbs posterior replaces the negative log-likelihood with an arbitrary loss function, exponentially tilting a prior towards parameters minimizing empirical risk. This loss-based update provides robustness to model misspecification and forgoes the dependence on a probabilistic model structure, enabling application to ERM setups beyond likelihood-based models.
The PAC-Bayes framework then produces non-asymptotic bounds for the posterior-averaged population risk by trading off empirical fit with a complexity penalty given by the KL divergence between the posterior and prior. The core innovation is to combine these risk and penalty contributions into a single marginal-type integral over parameter space, closely related to the partition function in statistical mechanics and Bayesian evidence. Analytically, this replaces covering entropy with intrinsic integration geometry and facilitates explicit, sharp bounds.
Singular Learning Theory and RLCT
Singular learning theory provides tools from algebraic geometry to analyze the marginal likelihood in non-regular models. In regular models, the classical Laplace approximation yields a penalty d/2 (with d the parameter dimension); in singular models, the leading penalty term is given by the real log canonical threshold (RLCT) λ, a quantity reflecting the intrinsic geometry of the loss landscape near the minima, and which can be much smaller than d/2 or fractional. This adaptivity is crucial for modern deep architectures, mixture models, and other non-identifiable settings.
Most singular learning theory results are asymptotic, treating increasing sample regimes and relying on strong regularity and entropy control. The present work develops a non-asymptotic framework, leveraging a Bernstein-type sub-exponential moment assumption on the loss to directly relate empirical and population risks, thus avoiding explicit entropy arguments. This enables a two-sided PAC-Bayes control—bounding both empirical by population risk and vice versa—leading to explicit finite-sample risk bounds for Gibbs posteriors in terms of the RLCT.
Main Theoretical Results
The principal result is an explicit high-probability PAC-Bayes bound stating that, under the Bernstein-type moment condition, for any δ∈(0,1) and sufficient learning rate ω, with probability at least 1−δ,
∫ΘR(θ,θ∗) Πn(dθ)≤(1−2ωL)ωn2[−log∫Θexp{−cnR(θ,θ∗)}φ(dθ)+logδ2],
where c depends on ω,L and the loss, and d0 is the prior. This marginal-type integral is then analyzed via singular learning theory, yielding
d1
with d2 the RLCT and d3 its multiplicity. In practice, bounds are substantially tighter than those based on parameter dimension, especially in overparameterized and singular models.
Key claims:
- The RLCT d4 encapsulates complexity adaptively based on local geometry, rather than global parameter count.
- The derived bounds are explicit, analytically tractable, and remain tight even when classical covering-based approaches fail.
- Posterior averaging in Gibbs inference induces an Occam's razor effect, penalizing models only for the size of low-risk regions, not the overall space.
Applications: Low-Rank Matrix Completion and Deep Neural Nets
Matrix Completion
For low-rank matrix completion, the analysis yields a PAC-Bayes risk bound in terms of the Frobenius norm error, with RLCT expressions depending intricately on the true matrix rank d5, ambient dimensions d6, and latent dimension d7. The RLCT is often much smaller than both the parameter dimension d8 and ambient size d9—especially when the rank is low or the matrix is highly unbalanced—reflecting the redundancy and non-identifiability of parameters in practical regimes.
The bound admits explicit forms:
λ0
This result formalizes the reduction in statistical complexity, and demonstrates how the bounds adapt automatically to the intrinsic rank structure.
ReLU Neural Networks
For deep ReLU networks in regression and classification (with squared and logistic loss), the bound is determined by the minimal architecture needed to realize the true regression or logit function, not by the width or depth of the actual network used. Specifically, the RLCT is upper-bounded by half the parameter count of the minimal network, regardless of the size and overparameterization. These results clarify why generalization occurs in deep models exceeding the classical sample complexity, and provide explicit bounds for practitioners.
Implications and Future Directions
Practically, the results offer rigorous, finite-sample generalization guarantees for highly overparameterized models, matrix completion, and networks exhibiting singularities. Theoretically, this framework reframes statistical complexity, prioritizing intrinsic local geometry and RLCT over parameter counting. These advancements facilitate principled uncertainty quantification and model selection, suggesting avenues for adapting Bayesian information criteria (BIC) to singular models ("singular BIC") and for direct empirical estimation of RLCT.
The approach is extensible to a variety of loss functions, noise structures, and model classes—potentially including deep generative models, structured prediction, and functional spaces. Further research may refine RLCT estimation, extend non-asymptotic bounds to stochastic optimization (e.g., SGD), and integrate fractal or data-dependent dimension estimates [Dupuis et al., 2023].
Conclusion
This paper provides a technically rigorous, explicit framework for PAC-Bayes generalization bounds in modern singular models, leveraging the RLCT from singular learning theory. Derived bounds are tight, data-adaptive, and practical for low-rank, neural network, and overparameterized settings, highlighting both the theoretical and applied benefits of posterior averaging and intrinsic complexity quantification. The results are likely to influence model assessment, uncertainty quantification, and criteria for model selection in complex, high-dimensional learning environments.