Papers
Topics
Authors
Recent
Search
2000 character limit reached

Complexity Guarantees for Zeroth-order Methods via Exponentially-shifted Gaussian Smoothing: Mitigating Dimension-dependence and Incorporating Decision-dependence

Published 16 Apr 2026 in math.OC and math.PR | (2604.15525v1)

Abstract: In this paper, we consider two distinct challenges in the resolution of nonsmooth stochastic optimization. Of these, the first pertains to the pronounced dependence of dimension in Gaussian smoothing-enabled zeroth-order schemes, impeding applications to large-scale settings. Second, no unified analysis {exists} for smoothing-enabled stochastic zeroth-order methods, allowing for capturing standard and decision-dependent stochastic optimization. To contend with the first challenge, we introduce a new exponentially-shifted Gaussian smoothing {\bf esGs} estimator whose moment bounds enjoy a linear dependence on dimension (rather than a quadratic dependence as in standard Gaussian smoothing estimators). Second, we show that such an estimator can be extended in two distinct ways to address decision-dependent regimes where the underlying densities are either available in closed form or not. Notably, the resulting gradient estimators continue to display a linear dependence in dimension. We then develop an {\bf esGs}-enabled stochastic zeroth-order method applicable to nonsmooth strongly convex, convex, and {non-}convex regimes. Importantly, our guarantees provide improved dimension-dependence in iteration complexities (by a factor of problem dimension $n$) while maintaing similar oracle complexities. In addition, we also provide novel high probability guarantees and almost sure sublinear rate guarantees in convex settings, while a new subsequential almost sure convergence guarantee is provided in nonconvex regimes. Preliminary numerics support our theoretical findings show that the proposed schemes display improved computational times and more refined empirical accuracies.

Summary

  • The paper introduces the esGs estimator that provides unbiased gradient estimates with improved moment bounds and linear dimension dependence.
  • The paper demonstrates that esGs reduces iteration complexity from O(n²ε⁻²) to O(nε⁻²) for nonsmooth convex functions, enhancing scalability in high dimensions.
  • Empirical evaluations show that the esGs-based method outperforms classical Gaussian smoothing across diverse stochastic and decision-dependent optimization scenarios.

Complexity Guarantees for Zeroth-order Methods via Exponentially-shifted Gaussian Smoothing: Summary and Expert Analysis

This paper presents a novel approach to improving the dimension dependence in the complexity guarantees of zeroth-order (ZO) stochastic optimization methods. By introducing the exponentially-shifted Gaussian smoothing (esGs) estimator, the authors address both the standard and decision-dependent regimes in stochastic optimization, offering new moment bounds, complexity improvements, and convergence guarantees.


Background and Motivation

Zeroth-order optimization (also called derivative-free optimization) is central in settings where gradients are unavailable or expensive to compute, such as simulation-based or black-box optimization problems. Classical random Gaussian smoothing (GS) methods estimate gradients using Gaussian perturbations, but these admit variance and bias bounds that scale quadratically in the problem dimension nn. This leads to iteration and sample complexity that is prohibitive for high-dimensional problems, especially with nonsmooth objectives.

Decision-dependent stochastic optimization (“performative prediction”) complicates the picture further: the data distribution D(x)D(x) depends on the decision variable xx, so the classical SA estimators become biased, and convergence guarantees are generally absent for zeroth-order methods in this regime.

The esGs estimator is proposed to address (1) the dimension-dependence bottleneck and (2) the challenges in the ZO analysis of decision-dependent problems.


The esGs Estimator: Construction and Theoretical Properties

The esGs estimator is derived from a change-of-variable argument where, for each coordinate, the function is perturbed not just by a Gaussian random vector but by an exponential scaling in the chosen coordinate, while retaining Gaussian smoothing elsewhere. For the ii-th coordinate, the estimator uses

gηi(x,V,Z)=1η2π[f(xi+η2V,xiZi)f(xiη2V,xiZi)]g_\eta^i(x, V, Z) = \frac{1}{\eta \sqrt{2\pi}} \left[ f(x_i + \eta \sqrt{2V}, x^{-i} - Z^{-i}) - f(x_i - \eta \sqrt{2V}, x^{-i} - Z^{-i}) \right]

where VExp(1)V \sim \mathrm{Exp}(1) and ZNn(0,η2I)Z \sim \mathcal{N}_n(0, \eta^2 I).

Key theoretical properties:

  • Unbiasedness: The estimator is unbiased for the gradient of a mollified version of the original function, fη(x)f_\eta(x).
  • Moment bounds: The second moment of the estimator scales as O(n)\mathcal{O}(n), in contrast to O(n2)\mathcal{O}(n^2) for standard GS estimators.
  • Uniform Lipschitz continuity: The moment and bias bounds are established uniformly for both standard and decision-dependent problems.
  • Decision-dependent extensions: The estimator admits modifications accommodating situations where the response distribution D(x)D(x)0 is either known or available only via simulation/oracle access.

This improvement is critical for high-dimensional settings, effectively mitigating a key limitation of existing GS-based ZO approaches.


Algorithmic Framework and Complexity Guarantees

Standard and Decision-dependent Settings

The paper develops a stochastic ZO gradient method employing the esGs estimator, framed to apply to:

  • Standard setting: D(x)D(x)1, with D(x)D(x)2 independent of D(x)D(x)3.
  • Decision-dependent setting: D(x)D(x)4, with various assumptions on either access to D(x)D(x)5 or oracle-based generation of D(x)D(x)6 pairs.

Both frameworks are treated by leveraging mollification and the esGs estimator, ensuring that moment bounds and complexity results remain robust.

Improved Complexity Bounds

For nonsmooth convex objectives, the esGs approach reduces the iteration complexity from D(x)D(x)7 (for GS) to D(x)D(x)8 for achieving D(x)D(x)9-accuracy:

  • Table (Paper): Iteration and oracle complexity summary (matching up to logarithmic factors):
Method Iterate Complexity Oracle Complexity Moment Bound
Gaussian Smoothing (GS) xx0 xx1 xx2
Simultaneous/Coordinate schemes xx3 xx4 xx5
esGs (this work) xx6 xx7 xx8
  • For nonsmooth nonconvex objectives, similar improvements are shown in stationarity residuals with matching or improved xx9-dependence in the iteration and sample complexities.
  • The favorable ii0 scaling is preserved in decision-dependent settings, even when ii1 is available only implicitly, under mild technical conditions.

High-Probability and Almost Sure Convergence Guarantees

The paper establishes:

  • High-probability bounds for function suboptimality and (in convex regimes) convergence in almost sure sense, with explicit rates.
  • Subsequential almost sure convergence in nonconvex cases: limit points of the sequence generated by the algorithm are Clarke stationary points of the original function.

These finer-grained guarantees fill a theoretical gap in prior ZO work, which has typically limited itself to expected-value convergence analyses, especially in nonsmooth and decision-dependent regimes.


Empirical Illustration

Preliminary numerical experiments demonstrate that esGs-based schemes:

  • Exhibit dramatically slower growth in error and runtime with increasing ii2 relative to standard GS and coordinate-perturbation ZO methods, consistent with theory.
  • Perform robustly in the presence of noise (both in standard and decision-dependent scenarios).
  • Achieve strong empirical performance on nonsmooth, convex quadratic, piecewise-linear, and nonconvex objectives, even at moderate to high dimension (ii3 or more). Figure 1

Figure 1

Figure 1: Empirical convergence of the esGs estimator in a high-dimensional, nonsmooth nonconvex stochastic optimization problem demonstrates improved performance over classical GS methods as reflected in faster error decay and reduced runtime.


Practical and Theoretical Implications

  • Scalability: The new estimator improves the feasibility of ZO methods for high-dimensional nonsmooth stochastic optimization, where projection costs and function evaluation budgets dominate.
  • Generality: The approach admits a unified analysis for both classic and performative (decision-dependent) risk settings, encompassing a range of real-world ML and engineering scenarios.
  • Robustness: Empirical robustness under noise and adversarial or data-driven environments is supported by both numerical and theoretical evidence.

Speculation on Future Developments

  • Integration with Quasi-Newton Schemes: Could potentially be combined with Quasi-Newton approximations for Hessian-vector estimation under ZO settings.
  • Application in Bandit and RL-like Settings: May enable efficient ZO optimization in online and bandit feedback models with delayed or nonstationary data, especially where the feedback may be affected by the agent’s prior actions.
  • Further Dimension-Independence: Spherical smoothing and adaptive randomization might be further leveraged, but the esGs estimator provides a new class of smoothing estimators to investigate.

Conclusion

The exponentially-shifted Gaussian smoothing (esGs) estimator constitutes a substantive advance in stochastic ZO optimization, enabling improved complexity guarantees by achieving a strictly linear dependence on the problem dimension in the critical variance terms. The unified theory, encompassing both standard and decision-dependent regimes, and strong probabilistic convergence results, together with empirical evidence, make the esGs approach a compelling option for high-dimensional, nonsmooth, and black-box optimization scenarios, as well as for advanced applications in performative and adaptive data settings.


References

See (2604.15525) for the full list of citations and detailed empirical data.

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Collections

Sign up for free to add this paper to one or more collections.

Tweets

Sign up for free to view the 1 tweet with 4 likes about this paper.