Shuffling-Aware Optimization for Private Vector Mean Estimation

Published 30 Apr 2026 in cs.LG | (2604.28032v1)

Abstract: We study $d$-dimensional unbiased mean estimation in the single-message shuffle model, where each user sends a single privatized message and the analyzer only observes the shuffled multiset of reports. While minimax-optimal mechanisms are well understood in the local differential privacy setting, the corresponding notion of optimality after shuffling has remained largely unexplored. To address this gap, we introduce the recently proposed shuffle index and use it to formulate the post-shuffling mechanism design problem as an explicit optimization problem. We then establish a minimax lower bound on the achievable mean squared error in terms of the shuffle index, which implies that mechanisms that are optimal under LDP can become suboptimal once shuffling is applied. Finally, we construct an asymptotically minimax optimal mechanism in the high privacy regime, which as a consequence achieves a privacy-utility trade-off nearly identical to that of the central Gaussian mechanism.

Abstract PDF Upgrade to Chat

Authors (2)

Summary

The paper presents the shuffle index as a central tool to capture privacy amplification and enables a tractable minimax optimization for vector mean estimation.
It establishes sharp lower bounds on the MSE and reveals that mechanisms optimal under LDP can be strictly suboptimal in multi-dimensional settings.
A blanket-mixed Gaussian mechanism is designed that converges to the central Gaussian performance in high privacy regimes, validating both theoretical and empirical claims.

Shuffling-Aware Minimax Optimization for Private Vector Mean Estimation

Introduction and Motivation

This paper addresses the central question of optimal mean estimation in the single-message shuffle model of differential privacy (DP), focusing on the $d$ -dimensional setting with inputs in the unit $\ell_2$ ball. The shuffle model operates between the canonical central and local DP regimes: users employ local randomizers to privatize their data and transmit a single message, which is subsequently shuffled uniformly. While minimax-optimal mechanisms for local DP (LDP) mean estimation are well understood [asiOptimalAlgorithmsMean2022], optimality after shuffling remains underexplored, with prior work generally assuming that LDP-optimal mechanisms remain optimal post-shuffling. This assumption is challenged in the present work.

The authors introduce the shuffle index—a scalar quantity summarizing the privacy amplification induced by shuffling—for formulating the post-shuffling minimax MSE optimization as a tractable problem. They derive sharp lower bounds, demonstrate that LDP-optimal mechanisms can be strictly suboptimal after shuffling, and construct an explicit mechanism that achieves asymptotically optimal performance, matching the Gaussian mechanism of the central model in the high privacy regime.

Shuffle Index and the Mechanism Design Problem

The shuffle index [takagiAnalysisShufflingPure2026] quantifies the privacy amplification effect intrinsic to shuffling and depends solely on the randomizer family, independent of the global population size. In the high privacy regime ( $\epsilon_n = O(\sqrt{\log n/n})$ ), the privacy loss of a shuffled protocol at $(\epsilon_n,\delta_n)$ is sharply governed (in an asymptotic sense) by the shuffle index $\chi$ . The central notion is:

Optimization Objective: Minimize worst-case squared error (MSE) among unbiased single-message protocols, under a shuffle-index-based privacy constraint.

The authors formalize the mechanism design problem using this relaxation: among all local randomizers $\mathcal{R}$ in a broad admissible class, find the mechanism minimizing MSE subject to a lower bound on the shuffle index.

Main Results: Minimax Lower and Upper Bounds

Lower Bound

A minimax-style lower bound is established: any unbiased protocol using a local randomizer with shuffle index $\chi$ must incur

$\mathrm{MSE} \geq d \chi^2$

for mean estimation in $d$ dimensions. This bound is sharp, matching (up to vanishing relative error) the performance of the central Gaussian mechanism in the aforementioned high privacy regime. Critically, mechanisms that are optimal under LDP (e.g., PrivUnit) can be strictly suboptimal after shuffling, especially in high dimensions, incurring a multiplicative constant gap (as large as $\pi/2$ for PrivUnit in the $\ell_2$ 0 limit).

Upper Bound: Blanket-Mixed Gaussian Mechanism

To attain the lower bound, an explicit blanket-mixed Gaussian local randomizer is constructed:

With probability $\ell_2$ 1, output a sample from $\ell_2$ 2 (the "privacy blanket").
With probability $\ell_2$ 3, output a sample from $\ell_2$ 4, for input $\ell_2$ 5.

An optimal tunable estimator is provided. In the limit $\ell_2$ 6, the MSE satisfies

$\ell_2$ 7

demonstrating asymptotic minimax optimality. The blanket-mixed Gaussian matches the Gaussian noise mechanism in the central model, up to vanishing correction.

Gaussian Limit Correspondence

The Gaussian limit correspondence theorem is established: in the regime $\ell_2$ 8 and $\ell_2$ 9, the privacy-utility trade-off of the optimal single-message shuffled mechanism converges (in MSE and privacy profile) to that of the central Gaussian mechanism with appropriately calibrated noise, matching both leading order and constants. This result resolves previously conjectured behavior, bridging empirical and theoretical observations [chenPrivacyAmplificationCompression2023].

Numerical Instantiation

A set of empirical results validate the theoretical claims, showcasing the convergence of the blanked-mixed Gaussian shuffling protocol to the central Gaussian baseline. Figure 1 exhibits the utility—measured in RMSE—across $\epsilon_n = O(\sqrt{\log n/n})$ 0 for fixed $\epsilon_n = O(\sqrt{\log n/n})$ 1 and $\epsilon_n = O(\sqrt{\log n/n})$ 2.

Figure 1: Utility comparison at fixed $\epsilon_n = O(\sqrt{\log n/n})$ 3 reveals the blanket-mixed Gaussian mechanism approaches central Gaussian utility in the high privacy regime.

This supports the claim: shuffling can essentially erase the local-to-central utility gap for mean estimation as the privacy requirement becomes stringent.

Suboptimality of LDP Mechanisms and Implications

A strong assertion is made and proved: existing LDP-minimax-optimal primitives (e.g., PrivUnit) are provably not minimax optimal after shuffling in $\epsilon_n = O(\sqrt{\log n/n})$ 4. In high dimensions, the asymptotic MSE lower bound cannot be attained by these mechanisms, regardless of parameter tuning. Only for $\epsilon_n = O(\sqrt{\log n/n})$ 5—where randomized response is optimal under both LDP and shuffling—does the LDP mechanism achieve minimaxity.

Theoretical and Practical Impact

Theoretical implications: The shuffle index provides a sharp, mechanism-centric lens for analyzing privacy amplification and mechanism design in the shuffle model, supplanting prior, looser uniform lower bounds. The reduction to a single-user shuffle index-constrained minimax optimization enables recasting post-shuffling design as a concrete, tractable problem and generalizes to a broad class of input domains and local randomizers, including non-LDP mechanisms.

Practical implications: In federated or decentralized settings where trust in a central curator is infeasible but local DP's accuracy cost is unacceptable, shuffle-based mechanisms can be designed to match central-model accuracy in the high privacy regime. Furthermore, the demonstrated constant-factor suboptimality of standard LDP randomizers post-shuffling compels practitioners to explicitly optimize randomizers for the shuffle context.

Future Directions in AI and Differential Privacy

Future research could aim to:

Tighten the vanishing error term $\epsilon_n = O(\sqrt{\log n/n})$ 6 or verify conjectured sharper bounds (potentially matching $\epsilon_n = O(\sqrt{\log n/n})$ 7 in the pure Gaussian case).
Extend shuffle-index-based approaches and minimax characterizations to more complex statistical or learning tasks (e.g., regression, learning over general convex bodies).
Refine composition analyses for multi-round shuffle protocols, leveraging the Gaussian correspondence to approach central DP composition tightness.
Explore the impact of allowing biased estimators rather than enforcing unbiasedness, which may further close the utility gap in practical settings.

Conclusion

This work establishes the shuffle index as a fundamental privacy-utility parameter for single-message shuffle-model mean estimation and provides a complete minimax analysis in the high privacy regime. The construction and analysis of the blanket-mixed Gaussian randomizer yield protocols that---contrary to prior assumptions---render LDP-minimax mechanisms suboptimal, and they rigorously confirm the privacy-utility equivalence to the central Gaussian mechanism when privacy requirements are severe. The results provide clarity for both theoretical investigation and practical deployment of privacy-preserving primitives in distributed and federated environments (2604.28032).

Markdown Report Issue