Papers
Topics
Authors
Recent
Search
2000 character limit reached

Sample Average Approximation for Distributionally Robust Optimization with $φ$-divergences

Published 12 Apr 2026 in math.OC and math.ST | (2604.10855v1)

Abstract: It is well known that estimating the expectation of any given bounded random variable with values in $[-B, B]$ has a sample complexity of $\mathcal{O}(B2/ε2)$ that is independent of the underlying probability measure. We show that this property can no longer hold when evaluating the worst-case expectation of the random variable, when the probability measures defining the expectation belong to the $φ$-divergence ball centered at some nominal measure $P$. Specifically, the sample complexity and its dependence on the nominal measure can be completely characterized by the growth of the divergence function. When the divergence function $φ$ exhibits superlinear growth, then a $P$-independent sample complexity of $\mathcal{O}(M_{φ, τ}(ε) / ε2)$ can be obtained by sample average approximation. Here $M_{φ, τ}(\cdot) $ is a function that only depends on the growth of $φ$ and the radius $τ$ of the divergence ball. On the other hand, when superlinear growth does not hold for $φ$, we show that for any estimation method, evaluating the worst-case expectation has a $P$-dependent sample complexity lower bound that can be made arbitrarily large by changing $P$.

Authors (1)
  1. Yan Li 

Summary

  • The paper identifies a dichotomy in sample complexity based on φ’s growth, with superlinear divergences providing P-independent bounds.
  • It employs a refined SAA method with truncation and dual regularization to control concentration errors across empirical measures.
  • The study informs practitioners that superlinear divergences, such as KL-divergence, yield robust and scalable risk estimates in high-dimensional settings.

Sample Average Approximation for Distributionally Robust Optimization with φφ-divergences: Sample Complexity Characterization

Introduction and Problem Formulation

This paper investigates the sample complexity of Sample Average Approximation (SAA) for the estimation of worst-case expectations under a class of distributionally robust optimization (DRO) models defined via φφ-divergence ambiguity sets. Specifically, for a bounded measurable function X:Ω[B,B]X: \Omega \to [-B,B] and a nominal measure PP over (Ω,F)(\Omega, \mathcal{F}), the DRO functional is

R(X)=supζD,Ωϕ(ζ)dPτΩX(ω)ζ(ω)dP(ω)\mathcal{R}(X) = \sup_{\zeta \in \mathfrak{D}, \int_\Omega \phi(\zeta) dP \le \tau} \int_\Omega X(\omega)\zeta(\omega)dP(\omega)

where ϕ\phi is a convex divergence function with ϕ(1)=0\phi(1) = 0 and domain restriction to nonnegative arguments, and D\mathfrak{D} is the set of probability densities with respect to PP. The SAA is defined analogously using the empirical distribution φφ0 based on φφ1 IID samples.

Whereas in the risk-neutral expectation case the sample complexity is always φφ2 and independent of φφ3, the authors show the situation is fundamentally altered in the DRO context. The main contributions are a sharp characterization—dependent only on the growth rate of φφ4—of when the sample complexity remains nominal-measure-independent, and explicit lower and upper bounds in the respective regimes.

Main Results

Sublinear Versus Superlinear Growth Dichotomy

The analysis hinges on the asymptotic growth property of the divergence function φφ5:

  • Sublinear Growth: φφ6 has sublinear growth if φφ7.
  • Superlinear Growth: φφ8 has superlinear growth if φφ9.

Sublinear Growth: Necessarily Nominal-Dependent and Unbounded Sample Complexity

For divergence functions with sublinear growth (e.g., Total Variation), the authors construct instances where, for arbitrary small probability mass assigned by X:Ω[B,B]X: \Omega \to [-B,B]0 to regions where X:Ω[B,B]X: \Omega \to [-B,B]1 is non-zero, the worst-case expectation X:Ω[B,B]X: \Omega \to [-B,B]2 remains bounded below, but the SAA estimator can be zero with substantial probability if the support is not sampled. This implies a fundamental lower bound:

  • For any estimator, the sample complexity for X:Ω[B,B]X: \Omega \to [-B,B]3-accuracy depends on the “rarity” of events under X:Ω[B,B]X: \Omega \to [-B,B]4 and can be made arbitrarily large, scaling inversely with the nominal probability of sampling these events. This is formalized both for SAA and information-theoretic estimators via Le Cam’s method.

Thus, any estimator for the DRO risk X:Ω[B,B]X: \Omega \to [-B,B]5 cannot, in general, achieve nominal-independence if X:Ω[B,B]X: \Omega \to [-B,B]6 is sublinear.

Superlinear Growth: P-independent Upper Bounds

When X:Ω[B,B]X: \Omega \to [-B,B]7 exhibits superlinear growth (e.g., KL-divergence, X:Ω[B,B]X: \Omega \to [-B,B]8-divergence for X:Ω[B,B]X: \Omega \to [-B,B]9), the authors establish strong P-independent sample complexity upper bounds. Technically, this property ensures that excessively large likelihood ratios are penalized in the ambiguity set, which, in turn, prevents singular concentration to rare events under SAA.

Key elements of the analysis:

  • The DRO problem is regularized by truncating PP0 to a bounded domain, after which both primal and dual formulations are controlled.
  • Explicit Lipschitz modulus bounds for the dual functional are established, enabling uniform concentration results with respect to the empirical measure.
  • For a superlinear divergence PP1, the sample complexity bound for SAA is PP2, where PP3 depends only on the growth of PP4 and the ambiguity radius PP5, independent of PP6.

This result recovers and generalizes prior bounds for specific divergences but provides a comprehensive growth-rate-based criterion.

Technical Approach and Concentration Analysis

  • The approach includes a delicately justified truncation of the divergence function, yielding a uniformly tight surrogate DRO functional.
  • Lipschitz continuity of the dual form over bounded regions is derived, which enables the deployment of uniform high-probability deviations for empirical processes.
  • By carefully balancing truncation error, uniform covering number bounds, and empirical deviations, the overall sample complexity is minimized and specified precisely.
  • Lower bounds for the sublinear case are shown using explicit two-point constructions.

Theoretical and Practical Implications

These results fully resolve, for general PP7-divergences, the question of when robust risk optimization can be estimated from data with fixed accuracy and confidence independently of the underlying measure. The dichotomy identifies exactly for which classes of divergences this is possible and quantifies the role of tail penalization properties of PP8.

Practically, this means that superlinearly growing divergences are preferable in DRO settings where the underlying support or tail behavior of PP9 cannot be well controlled or is expected to be high-dimensional. For example, KL-robust DRO and square divergence (when (Ω,F)(\Omega, \mathcal{F})0) are sample-efficient even in large-data regimes and continuous support, while total variation DRO is not.

The methods proposed—truncation and dual regularization—suggest algorithmically implementable SAA approaches, with explicit guidance for parameter selection to balance estimation error and computational tractability.

Directions for Future Work

The paper outlines several avenues:

  • Extension to optimization where the risk envelope is parameterized by decision variables (i.e., stochastic programming or DRO-MDPs), necessitating uniformity over policy classes.
  • Development of instance-dependent bounds (incorporating (Ω,F)(\Omega, \mathcal{F})1 when available).
  • Analysis of asymptotic behavior and normality of the SAA estimators, comparing to the risk-neutral regime.

Such work could lead to more scalable and theoretically principled algorithms for large-scale robust control and learning problems, as well as better understanding of the statistical tradeoffs in model risk quantification.

Conclusion

This research provides a rigorous and fine-grained analysis of the sample complexity of SAA for (Ω,F)(\Omega, \mathcal{F})2-divergence-based DRO, revealing an exact growth-rate-based dichotomy for divergence functions. For superlinearly growing divergences, SAA achieves (Ω,F)(\Omega, \mathcal{F})3-independent polynomial sample complexity, while for sublinear divergences, estimation is always fundamentally nominal-dependent. These insights have significant implications for both the theoretical understanding of DRO and the practice of robust data-driven optimization and learning.

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Collections

Sign up for free to add this paper to one or more collections.