Complexity Guarantees for Zeroth-order Methods via Exponentially-shifted Gaussian Smoothing: Mitigating Dimension-dependence and Incorporating Decision-dependence

Published 16 Apr 2026 in math.OC and math.PR | (2604.15525v1)

Abstract: In this paper, we consider two distinct challenges in the resolution of nonsmooth stochastic optimization. Of these, the first pertains to the pronounced dependence of dimension in Gaussian smoothing-enabled zeroth-order schemes, impeding applications to large-scale settings. Second, no unified analysis {exists} for smoothing-enabled stochastic zeroth-order methods, allowing for capturing standard and decision-dependent stochastic optimization. To contend with the first challenge, we introduce a new exponentially-shifted Gaussian smoothing {\bf esGs} estimator whose moment bounds enjoy a linear dependence on dimension (rather than a quadratic dependence as in standard Gaussian smoothing estimators). Second, we show that such an estimator can be extended in two distinct ways to address decision-dependent regimes where the underlying densities are either available in closed form or not. Notably, the resulting gradient estimators continue to display a linear dependence in dimension. We then develop an {\bf esGs}-enabled stochastic zeroth-order method applicable to nonsmooth strongly convex, convex, and {non-}convex regimes. Importantly, our guarantees provide improved dimension-dependence in iteration complexities (by a factor of problem dimension $n$) while maintaing similar oracle complexities. In addition, we also provide novel high probability guarantees and almost sure sublinear rate guarantees in convex settings, while a new subsequential almost sure convergence guarantee is provided in nonconvex regimes. Preliminary numerics support our theoretical findings show that the proposed schemes display improved computational times and more refined empirical accuracies.

Abstract PDF Upgrade to Chat

Authors (3)

Summary

The paper introduces the esGs estimator that provides unbiased gradient estimates with improved moment bounds and linear dimension dependence.
The paper demonstrates that esGs reduces iteration complexity from O(n²ε⁻²) to O(nε⁻²) for nonsmooth convex functions, enhancing scalability in high dimensions.
Empirical evaluations show that the esGs-based method outperforms classical Gaussian smoothing across diverse stochastic and decision-dependent optimization scenarios.

Complexity Guarantees for Zeroth-order Methods via Exponentially-shifted Gaussian Smoothing: Summary and Expert Analysis

This paper presents a novel approach to improving the dimension dependence in the complexity guarantees of zeroth-order (ZO) stochastic optimization methods. By introducing the exponentially-shifted Gaussian smoothing (esGs) estimator, the authors address both the standard and decision-dependent regimes in stochastic optimization, offering new moment bounds, complexity improvements, and convergence guarantees.

Background and Motivation

Zeroth-order optimization (also called derivative-free optimization) is central in settings where gradients are unavailable or expensive to compute, such as simulation-based or black-box optimization problems. Classical random Gaussian smoothing (GS) methods estimate gradients using Gaussian perturbations, but these admit variance and bias bounds that scale quadratically in the problem dimension $n$ . This leads to iteration and sample complexity that is prohibitive for high-dimensional problems, especially with nonsmooth objectives.

Decision-dependent stochastic optimization (“performative prediction”) complicates the picture further: the data distribution $D(x)$ depends on the decision variable $x$ , so the classical SA estimators become biased, and convergence guarantees are generally absent for zeroth-order methods in this regime.

The esGs estimator is proposed to address (1) the dimension-dependence bottleneck and (2) the challenges in the ZO analysis of decision-dependent problems.

The esGs Estimator: Construction and Theoretical Properties

The esGs estimator is derived from a change-of-variable argument where, for each coordinate, the function is perturbed not just by a Gaussian random vector but by an exponential scaling in the chosen coordinate, while retaining Gaussian smoothing elsewhere. For the $i$ -th coordinate, the estimator uses

$g_\eta^i(x, V, Z) = \frac{1}{\eta \sqrt{2\pi}} \left[ f(x_i + \eta \sqrt{2V}, x^{-i} - Z^{-i}) - f(x_i - \eta \sqrt{2V}, x^{-i} - Z^{-i}) \right]$

where $V \sim \mathrm{Exp}(1)$ and $Z \sim \mathcal{N}_n(0, \eta^2 I)$ .

Key theoretical properties:

Unbiasedness: The estimator is unbiased for the gradient of a mollified version of the original function, $f_\eta(x)$ .
Moment bounds: The second moment of the estimator scales as $\mathcal{O}(n)$ , in contrast to $\mathcal{O}(n^2)$ for standard GS estimators.
Uniform Lipschitz continuity: The moment and bias bounds are established uniformly for both standard and decision-dependent problems.
Decision-dependent extensions: The estimator admits modifications accommodating situations where the response distribution $D(x)$ 0 is either known or available only via simulation/oracle access.

This improvement is critical for high-dimensional settings, effectively mitigating a key limitation of existing GS-based ZO approaches.

Algorithmic Framework and Complexity Guarantees

Standard and Decision-dependent Settings

The paper develops a stochastic ZO gradient method employing the esGs estimator, framed to apply to:

Standard setting: $D(x)$ 1, with $D(x)$ 2 independent of $D(x)$ 3.
Decision-dependent setting: $D(x)$ 4, with various assumptions on either access to $D(x)$ 5 or oracle-based generation of $D(x)$ 6 pairs.

Both frameworks are treated by leveraging mollification and the esGs estimator, ensuring that moment bounds and complexity results remain robust.

Improved Complexity Bounds

For nonsmooth convex objectives, the esGs approach reduces the iteration complexity from $D(x)$ 7 (for GS) to $D(x)$ 8 for achieving $D(x)$ 9-accuracy:

Table (Paper): Iteration and oracle complexity summary (matching up to logarithmic factors):

Method	Iterate Complexity	Oracle Complexity	Moment Bound
Gaussian Smoothing (GS)	$x$ 0	$x$ 1	$x$ 2
Simultaneous/Coordinate schemes	$x$ 3	$x$ 4	$x$ 5
esGs (this work)	$x$ 6	$x$ 7	$x$ 8

For nonsmooth nonconvex objectives, similar improvements are shown in stationarity residuals with matching or improved $x$ 9-dependence in the iteration and sample complexities.
The favorable $i$ 0 scaling is preserved in decision-dependent settings, even when $i$ 1 is available only implicitly, under mild technical conditions.

High-Probability and Almost Sure Convergence Guarantees

The paper establishes:

High-probability bounds for function suboptimality and (in convex regimes) convergence in almost sure sense, with explicit rates.
Subsequential almost sure convergence in nonconvex cases: limit points of the sequence generated by the algorithm are Clarke stationary points of the original function.

These finer-grained guarantees fill a theoretical gap in prior ZO work, which has typically limited itself to expected-value convergence analyses, especially in nonsmooth and decision-dependent regimes.

Empirical Illustration

Preliminary numerical experiments demonstrate that esGs-based schemes:

Exhibit dramatically slower growth in error and runtime with increasing $i$ 2 relative to standard GS and coordinate-perturbation ZO methods, consistent with theory.
Perform robustly in the presence of noise (both in standard and decision-dependent scenarios).
Achieve strong empirical performance on nonsmooth, convex quadratic, piecewise-linear, and nonconvex objectives, even at moderate to high dimension ( $i$ 3 or more).

Figure 1: Empirical convergence of the esGs estimator in a high-dimensional, nonsmooth nonconvex stochastic optimization problem demonstrates improved performance over classical GS methods as reflected in faster error decay and reduced runtime.

Practical and Theoretical Implications

Scalability: The new estimator improves the feasibility of ZO methods for high-dimensional nonsmooth stochastic optimization, where projection costs and function evaluation budgets dominate.
Generality: The approach admits a unified analysis for both classic and performative (decision-dependent) risk settings, encompassing a range of real-world ML and engineering scenarios.
Robustness: Empirical robustness under noise and adversarial or data-driven environments is supported by both numerical and theoretical evidence.

Speculation on Future Developments

Integration with Quasi-Newton Schemes: Could potentially be combined with Quasi-Newton approximations for Hessian-vector estimation under ZO settings.
Application in Bandit and RL-like Settings: May enable efficient ZO optimization in online and bandit feedback models with delayed or nonstationary data, especially where the feedback may be affected by the agent’s prior actions.
Further Dimension-Independence: Spherical smoothing and adaptive randomization might be further leveraged, but the esGs estimator provides a new class of smoothing estimators to investigate.

Conclusion

The exponentially-shifted Gaussian smoothing (esGs) estimator constitutes a substantive advance in stochastic ZO optimization, enabling improved complexity guarantees by achieving a strictly linear dependence on the problem dimension in the critical variance terms. The unified theory, encompassing both standard and decision-dependent regimes, and strong probabilistic convergence results, together with empirical evidence, make the esGs approach a compelling option for high-dimensional, nonsmooth, and black-box optimization scenarios, as well as for advanced applications in performative and adaptive data settings.