- The paper identifies a dichotomy in sample complexity based on φ’s growth, with superlinear divergences providing P-independent bounds.
- It employs a refined SAA method with truncation and dual regularization to control concentration errors across empirical measures.
- The study informs practitioners that superlinear divergences, such as KL-divergence, yield robust and scalable risk estimates in high-dimensional settings.
Sample Average Approximation for Distributionally Robust Optimization with φ-divergences: Sample Complexity Characterization
This paper investigates the sample complexity of Sample Average Approximation (SAA) for the estimation of worst-case expectations under a class of distributionally robust optimization (DRO) models defined via φ-divergence ambiguity sets. Specifically, for a bounded measurable function X:Ω→[−B,B] and a nominal measure P over (Ω,F), the DRO functional is
R(X)=ζ∈D,∫Ωϕ(ζ)dP≤τsup∫ΩX(ω)ζ(ω)dP(ω)
where ϕ is a convex divergence function with ϕ(1)=0 and domain restriction to nonnegative arguments, and D is the set of probability densities with respect to P. The SAA is defined analogously using the empirical distribution φ0 based on φ1 IID samples.
Whereas in the risk-neutral expectation case the sample complexity is always φ2 and independent of φ3, the authors show the situation is fundamentally altered in the DRO context. The main contributions are a sharp characterization—dependent only on the growth rate of φ4—of when the sample complexity remains nominal-measure-independent, and explicit lower and upper bounds in the respective regimes.
Main Results
Sublinear Versus Superlinear Growth Dichotomy
The analysis hinges on the asymptotic growth property of the divergence function φ5:
- Sublinear Growth: φ6 has sublinear growth if φ7.
- Superlinear Growth: φ8 has superlinear growth if φ9.
Sublinear Growth: Necessarily Nominal-Dependent and Unbounded Sample Complexity
For divergence functions with sublinear growth (e.g., Total Variation), the authors construct instances where, for arbitrary small probability mass assigned by X:Ω→[−B,B]0 to regions where X:Ω→[−B,B]1 is non-zero, the worst-case expectation X:Ω→[−B,B]2 remains bounded below, but the SAA estimator can be zero with substantial probability if the support is not sampled. This implies a fundamental lower bound:
- For any estimator, the sample complexity for X:Ω→[−B,B]3-accuracy depends on the “rarity” of events under X:Ω→[−B,B]4 and can be made arbitrarily large, scaling inversely with the nominal probability of sampling these events. This is formalized both for SAA and information-theoretic estimators via Le Cam’s method.
Thus, any estimator for the DRO risk X:Ω→[−B,B]5 cannot, in general, achieve nominal-independence if X:Ω→[−B,B]6 is sublinear.
Superlinear Growth: P-independent Upper Bounds
When X:Ω→[−B,B]7 exhibits superlinear growth (e.g., KL-divergence, X:Ω→[−B,B]8-divergence for X:Ω→[−B,B]9), the authors establish strong P-independent sample complexity upper bounds. Technically, this property ensures that excessively large likelihood ratios are penalized in the ambiguity set, which, in turn, prevents singular concentration to rare events under SAA.
Key elements of the analysis:
- The DRO problem is regularized by truncating P0 to a bounded domain, after which both primal and dual formulations are controlled.
- Explicit Lipschitz modulus bounds for the dual functional are established, enabling uniform concentration results with respect to the empirical measure.
- For a superlinear divergence P1, the sample complexity bound for SAA is P2, where P3 depends only on the growth of P4 and the ambiguity radius P5, independent of P6.
This result recovers and generalizes prior bounds for specific divergences but provides a comprehensive growth-rate-based criterion.
Technical Approach and Concentration Analysis
- The approach includes a delicately justified truncation of the divergence function, yielding a uniformly tight surrogate DRO functional.
- Lipschitz continuity of the dual form over bounded regions is derived, which enables the deployment of uniform high-probability deviations for empirical processes.
- By carefully balancing truncation error, uniform covering number bounds, and empirical deviations, the overall sample complexity is minimized and specified precisely.
- Lower bounds for the sublinear case are shown using explicit two-point constructions.
Theoretical and Practical Implications
These results fully resolve, for general P7-divergences, the question of when robust risk optimization can be estimated from data with fixed accuracy and confidence independently of the underlying measure. The dichotomy identifies exactly for which classes of divergences this is possible and quantifies the role of tail penalization properties of P8.
Practically, this means that superlinearly growing divergences are preferable in DRO settings where the underlying support or tail behavior of P9 cannot be well controlled or is expected to be high-dimensional. For example, KL-robust DRO and square divergence (when (Ω,F)0) are sample-efficient even in large-data regimes and continuous support, while total variation DRO is not.
The methods proposed—truncation and dual regularization—suggest algorithmically implementable SAA approaches, with explicit guidance for parameter selection to balance estimation error and computational tractability.
Directions for Future Work
The paper outlines several avenues:
- Extension to optimization where the risk envelope is parameterized by decision variables (i.e., stochastic programming or DRO-MDPs), necessitating uniformity over policy classes.
- Development of instance-dependent bounds (incorporating (Ω,F)1 when available).
- Analysis of asymptotic behavior and normality of the SAA estimators, comparing to the risk-neutral regime.
Such work could lead to more scalable and theoretically principled algorithms for large-scale robust control and learning problems, as well as better understanding of the statistical tradeoffs in model risk quantification.
Conclusion
This research provides a rigorous and fine-grained analysis of the sample complexity of SAA for (Ω,F)2-divergence-based DRO, revealing an exact growth-rate-based dichotomy for divergence functions. For superlinearly growing divergences, SAA achieves (Ω,F)3-independent polynomial sample complexity, while for sublinear divergences, estimation is always fundamentally nominal-dependent. These insights have significant implications for both the theoretical understanding of DRO and the practice of robust data-driven optimization and learning.