- The paper extends the classical Dirichlet process by introducing a generalised prior that allows tunable skewness and kurtosis for random means.
- The methodology employs a novel stick-breaking construction using arbitrary Beta distributions along with recursive moment calculations to enhance model flexibility.
- The findings demonstrate improved posterior consistency and flexibility, with implications for robust Bayesian inference and efficient simulation techniques.
Bayesian Inference with Generalised Dirichlet Process Priors
Introduction
This paper presents a broad extension of the classical Dirichlet process (DP) prior, introducing the generalised Dirichlet process (GD) prior. The core innovation is to retain the DP as a special (interior) point in an expanded class of random probability measures, thereby allowing tunable skewness and higher moments in the prior over distributions—parameters that the DP does not flexibly accommodate. The GD prior provides substantially increased modeling flexibility for nonparametric Bayesian inference, notably enabling control over prior skewness and kurtosis of random means. The work rigorously investigates the theoretical properties, inferential ramifications, and computational handling of these generalisations.
The Generalised Prior Class
The classical DP emerges when the stick-breaking weights in a random discrete measure P=∑j=1∞​γj​δξj​​ are constructed from i.i.d. Beta(1, b) random variables. The GD prior extends this to arbitrary distributions H on (0,1), and especially to Beta(a, b) distributions with aî€ =1, without degeneracy at the boundary of the parameter space—a notable technical advantage over alternatives such as Pólya trees or tailfree processes.
This extension enables prior random means to exhibit arbitrary (within the constructional limits) positive or negative skewness and a range of kurtoses, holding base measure and variance constant. The paper provides explicit constructions, stochastic recursions, and moment calculations supporting these claims.
Support and Nonparametric Properties
It is established that, under mild conditions (H having full support in (0,1)), the GD prior assigns positive probability to all neighbourhoods (under setwise or weak convergence) of all probability measures absolutely continuous with respect to P0​. Thus, the GD class is genuinely nonparametric: it achieves large support and is not unduly restrictive in applications.
Stochastic Representation and Simulation
The GD prior admits a characterizing stochastic equation,
b0
with b1, b2, and b3 an independent copy of b4. This recursive identity underpins both theoretical results and efficient Markov chain Monte Carlo simulation schemes for the random measure b5 and for functionals (random means) thereof.
The paper emphasizes that, for practical purposes, generating the law of random means b6 can be accomplished using a tailored, rapidly convergent Markov chain.
Moment Calculations and Flexibility
Detailed recursive formulae for the full sequence of central moments of random means under the GD prior are derived. The extension beyond the DP is substantial: the shape of the distribution for random means is no longer dictated by the prior variance, but can be tuned via b7 and b8.
Specifically, when b9 (for fixed mean/variance), the skewness and kurtosis both increase relative to the DP; when H0, both decrease. The DP case H1 becomes an interior point in the space of attainable skewness and kurtosis, rather than an edge or corner.
Posterior Analysis and Consistency
Bayesian updating under the GD prior leads to a posterior mean estimator for H2 that is always a convex combination of the base measure H3 and the empirical distribution of the observed sample, but the coefficient H4 on H5 now exhibits richer asymptotics. For the DP, H6 decays at rate H7; for the GD it decays as H8, with H9 the Beta shape parameter controlling the left tail of the stick-breaking distribution.
Key results include:
- For (0,1)0, prior influence diminishes faster than in the classical DP; for (0,1)1, it persists longer.
- Posterior mean and full posterior distribution are consistent estimators of the true underlying distribution, but the speed of convergence can be made arbitrarily slow or fast by tuning (0,1)2.
- The convergence rate of the posterior variance is directly linked to the decay rate of (0,1)3.
- For (0,1)4, Bayesian and frequentist inferential procedures (posteriors and confidence intervals) align asymptotically.
These findings highlight that, contrary to the parametric case (where forgetting the prior is always (0,1)5), the speed of "prior forgetting" under the GD prior is not universal and depends on prior hyperparameters.
Structure of the Posterior Process
For non-Dirichlet choices of (0,1)6, the posterior process is substantially more complicated than in the DP case. An explicit mixture representation is derived: after observing a sample, the posterior remains in the same conjugacy class, but the underlying stick-breaking distributions are updated in a non-trivial, data-dependent way. In particular, when data points are distinct, the posterior over stick-breaking proportions becomes a product of appropriately updated Beta-like distributions; the indices at which the sample atoms appear are distributed according to geometric and multinomial mixtures.
The Dirichlet process emerges uniquely as the only member of the class for which posterior stick-breaking weights increase in proportion exactly to data multiplicities.
Distributional Properties of Random Means
Exact transforms and recursive equations for the distribution of (0,1)7 under the GD prior are provided, though closed forms remain elusive except in special cases (e.g., Cauchy base measure, certain normal scale mixtures). The approach clarifies that in practically relevant scenarios (normal and stable bases), the induced random means remain scale mixtures with moment sequences that can be computed recursively.
Additional Results and Extensions
- The GD prior enables informative Bayesian robustness analyses by comparing inferences under the DP with those under nearby GD priors.
- The construction can be extended to allow the stick-breaking weights' distributions to vary with (0,1)8, resulting in a vast class of conjugate nonparametric priors, subsuming many previous proposals.
- Treatment of tied data and marginalisation is discussed; the non-Dirichlet cases yield less tractable, non-uniform empirical weighting.
Implications and Future Directions
The GD prior class decouples prior variance from higher moments of the random means, affording controlled flexibility in Bayesian nonparametric modeling. This has direct impact on Bayesian robustness and can be leveraged in practical settings to mitigate sensitivity to base measure mis-specification without sacrificing the appealing support properties of the DP.
The richer class of priors will likely enable more faithful modeling of uncertainty in functionals and forecasts, and may substantially improve empirical performance in applications where the DP's lack of skewness/kurtosis control is limiting.
Further work is expected on computational strategies for posterior simulation and marginalisation, extensions to the hierarchical modeling setting, and systematic study of finite-sample and asymptotic optimality in regularized inference and credible intervals.
Conclusion
This work establishes the generalised Dirichlet process as a foundational extension to the Dirichlet process for Bayesian nonparametric inference. It thoroughly develops the mathematical properties, inferential formulas, and computational tools for the class, clarifying both the technical subtleties and practical gains of moving beyond the DP's limitations. The results suggest substantial promise for more flexible, robust, and interpretable Bayesian modeling.