Bayesian analysis for a generalised Dirichlet process prior

Published 18 Apr 2026 in math.ST | (2604.17160v1)

Abstract: A family of random probabilities is defined and studied. This family contains the Dirichlet process as a special case, corresponding to an inner point in the appropriate parameter space. The extension makes it possible to have random means with larger or smaller skewnesses as compared to skewnesses under the Dirichlet prior, and also in other ways amounts to additional modelling flexibility. The usefulness of such random probabilities for nonparametric Bayesian statistics is discussed. The posterior distribution is complicated, but inference can nevertheless be carried out via simulation, and some exact formulae are derived for the case of random means. The class of nonparametric priors provides an instructive example where the speed with which the posterior forgets its prior with increasing data sample size depends on special aspects of the prior, which is a different situation from that of parametric inference.

Abstract PDF Upgrade to Chat

Authors (1)

Nils Lid Hjort

Summary

The paper extends the classical Dirichlet process by introducing a generalised prior that allows tunable skewness and kurtosis for random means.
The methodology employs a novel stick-breaking construction using arbitrary Beta distributions along with recursive moment calculations to enhance model flexibility.
The findings demonstrate improved posterior consistency and flexibility, with implications for robust Bayesian inference and efficient simulation techniques.

Bayesian Inference with Generalised Dirichlet Process Priors

Introduction

This paper presents a broad extension of the classical Dirichlet process (DP) prior, introducing the generalised Dirichlet process (GD) prior. The core innovation is to retain the DP as a special (interior) point in an expanded class of random probability measures, thereby allowing tunable skewness and higher moments in the prior over distributions—parameters that the DP does not flexibly accommodate. The GD prior provides substantially increased modeling flexibility for nonparametric Bayesian inference, notably enabling control over prior skewness and kurtosis of random means. The work rigorously investigates the theoretical properties, inferential ramifications, and computational handling of these generalisations.

The Generalised Prior Class

The classical DP emerges when the stick-breaking weights in a random discrete measure $P = \sum_{j=1}^{\infty} \gamma_j \delta_{\xi_j}$ are constructed from i.i.d. Beta(1, $b$ ) random variables. The GD prior extends this to arbitrary distributions $H$ on $(0,1)$ , and especially to Beta( $a$ , $b$ ) distributions with $a \neq 1$ , without degeneracy at the boundary of the parameter space—a notable technical advantage over alternatives such as Pólya trees or tailfree processes.

This extension enables prior random means to exhibit arbitrary (within the constructional limits) positive or negative skewness and a range of kurtoses, holding base measure and variance constant. The paper provides explicit constructions, stochastic recursions, and moment calculations supporting these claims.

Support and Nonparametric Properties

It is established that, under mild conditions ( $H$ having full support in $(0,1)$ ), the GD prior assigns positive probability to all neighbourhoods (under setwise or weak convergence) of all probability measures absolutely continuous with respect to $P_0$ . Thus, the GD class is genuinely nonparametric: it achieves large support and is not unduly restrictive in applications.

Stochastic Representation and Simulation

The GD prior admits a characterizing stochastic equation,

$b$ 0

with $b$ 1, $b$ 2, and $b$ 3 an independent copy of $b$ 4. This recursive identity underpins both theoretical results and efficient Markov chain Monte Carlo simulation schemes for the random measure $b$ 5 and for functionals (random means) thereof.

The paper emphasizes that, for practical purposes, generating the law of random means $b$ 6 can be accomplished using a tailored, rapidly convergent Markov chain.

Moment Calculations and Flexibility

Detailed recursive formulae for the full sequence of central moments of random means under the GD prior are derived. The extension beyond the DP is substantial: the shape of the distribution for random means is no longer dictated by the prior variance, but can be tuned via $b$ 7 and $b$ 8.

Specifically, when $b$ 9 (for fixed mean/variance), the skewness and kurtosis both increase relative to the DP; when $H$ 0, both decrease. The DP case $H$ 1 becomes an interior point in the space of attainable skewness and kurtosis, rather than an edge or corner.

Posterior Analysis and Consistency

Bayesian updating under the GD prior leads to a posterior mean estimator for $H$ 2 that is always a convex combination of the base measure $H$ 3 and the empirical distribution of the observed sample, but the coefficient $H$ 4 on $H$ 5 now exhibits richer asymptotics. For the DP, $H$ 6 decays at rate $H$ 7; for the GD it decays as $H$ 8, with $H$ 9 the Beta shape parameter controlling the left tail of the stick-breaking distribution.

Key results include:

For $(0,1)$ 0, prior influence diminishes faster than in the classical DP; for $(0,1)$ 1, it persists longer.
Posterior mean and full posterior distribution are consistent estimators of the true underlying distribution, but the speed of convergence can be made arbitrarily slow or fast by tuning $(0,1)$ 2.
The convergence rate of the posterior variance is directly linked to the decay rate of $(0,1)$ 3.
For $(0,1)$ 4, Bayesian and frequentist inferential procedures (posteriors and confidence intervals) align asymptotically.

These findings highlight that, contrary to the parametric case (where forgetting the prior is always $(0,1)$ 5), the speed of "prior forgetting" under the GD prior is not universal and depends on prior hyperparameters.

Structure of the Posterior Process

For non-Dirichlet choices of $(0,1)$ 6, the posterior process is substantially more complicated than in the DP case. An explicit mixture representation is derived: after observing a sample, the posterior remains in the same conjugacy class, but the underlying stick-breaking distributions are updated in a non-trivial, data-dependent way. In particular, when data points are distinct, the posterior over stick-breaking proportions becomes a product of appropriately updated Beta-like distributions; the indices at which the sample atoms appear are distributed according to geometric and multinomial mixtures.

The Dirichlet process emerges uniquely as the only member of the class for which posterior stick-breaking weights increase in proportion exactly to data multiplicities.

Distributional Properties of Random Means

Exact transforms and recursive equations for the distribution of $(0,1)$ 7 under the GD prior are provided, though closed forms remain elusive except in special cases (e.g., Cauchy base measure, certain normal scale mixtures). The approach clarifies that in practically relevant scenarios (normal and stable bases), the induced random means remain scale mixtures with moment sequences that can be computed recursively.

Additional Results and Extensions

The GD prior enables informative Bayesian robustness analyses by comparing inferences under the DP with those under nearby GD priors.
The construction can be extended to allow the stick-breaking weights' distributions to vary with $(0,1)$ 8, resulting in a vast class of conjugate nonparametric priors, subsuming many previous proposals.
Treatment of tied data and marginalisation is discussed; the non-Dirichlet cases yield less tractable, non-uniform empirical weighting.

Implications and Future Directions

The GD prior class decouples prior variance from higher moments of the random means, affording controlled flexibility in Bayesian nonparametric modeling. This has direct impact on Bayesian robustness and can be leveraged in practical settings to mitigate sensitivity to base measure mis-specification without sacrificing the appealing support properties of the DP.

The richer class of priors will likely enable more faithful modeling of uncertainty in functionals and forecasts, and may substantially improve empirical performance in applications where the DP's lack of skewness/kurtosis control is limiting.

Further work is expected on computational strategies for posterior simulation and marginalisation, extensions to the hierarchical modeling setting, and systematic study of finite-sample and asymptotic optimality in regularized inference and credible intervals.

Conclusion

This work establishes the generalised Dirichlet process as a foundational extension to the Dirichlet process for Bayesian nonparametric inference. It thoroughly develops the mathematical properties, inferential formulas, and computational tools for the class, clarifying both the technical subtleties and practical gains of moving beyond the DP's limitations. The results suggest substantial promise for more flexible, robust, and interpretable Bayesian modeling.

Markdown Report Issue