The Optimal Sample Complexity of Multiclass and List Learning

Published 27 Apr 2026 in cs.LG and stat.ML | (2604.24749v1)

Abstract: While the optimal sample complexity of binary classification in terms of the VC dimension is well-established, determining the optimal sample complexity of multiclass classification has remained open. The appropriate complexity parameter for multiclass classification is the DS dimension, and despite significant efforts, a gap of $\sqrt{\text{DS}}$ has persisted between the upper and lower bounds on sample complexity. Recent work by Hanneke et al. (2026) shows a novel algebraic characterization of multiclass hypothesis classes in terms of their DS dimension. Building up on this, we show that the maximum hypergraph density of any multiclass hypothesis class is upper-bounded by its DS dimension. This proves a longstanding conjecture of Daniely and Shalev-Shwartz (2014). As a consequence, we determine the optimal dependence of the sample complexity on the DS dimension for multiclass as well as list learning.

Abstract PDF Upgrade to Chat

Authors (1)

Chirag Pabbaraju

Summary

The paper proves that maximum hypergraph density is bounded by the DS dimension, resolving a longstanding conjecture in multiclass learning.
It demonstrates optimal sample complexity bounds for multiclass and list learning, eliminating previous sqrt and log-factor gaps.
The algebraic approach employed paves the way for designing learning algorithms that are both theoretically tight and practically efficient.

Essay: The Optimal Sample Complexity of Multiclass and List Learning (2604.24749)

Introduction and Motivation

This paper addresses a central problem in statistical machine learning: the characterization of the optimal sample complexity for multiclass and list classification. While the VC dimension provides a precise measure for sample complexity in binary classification, determining the corresponding parameter and its tight relationship for multiclass classification has been a longstanding open problem. The relevant complexity parameter for multiclass settings is the DS (Daniely-Shalev-Shwartz) dimension. However, the gap between upper and lower bounds on sample complexity has persisted, specifically an unavoidable $\sqrt{d_{\mathrm{DS}}}$ factor in prevailing upper bounds. The paper resolves this key conjecture and extends the results to the broader setting of list learning.

Technical Framework

The paper operates within the PAC (Probably Approximately Correct) learning framework. For multiclass classification ( $k > 2$ ), the hypothesis class $\mathcal{H} \subseteq [k]^\mathcal{X}$ is characterized by the DS dimension $d_{\mathrm{DS}}$ , and for the list-learning setting (prediction of lists of size $\ell$ ), the $\ell$ -DS dimension $\mathit{d}^{\ell}_{\mathrm{DS}}$ is introduced.

The primary structural object is the one-inclusion graph of the hypothesis class. The maximum density function $\mu_\mathcal{H}(n)$ , introduced by Aden-Ali et al. [aden2023optimal], provides an upper bound for PAC error in terms of a combinatorial property of $\mathcal{H}$ , independent of logarithmic factors in $k$ or $k > 2$ 0.

A longstanding conjecture by Daniely and Shalev-Shwartz [daniely2014optimal] posits that $k > 2$ 1 is tightly bounded by $k > 2$ 2, giving rise to optimal sample complexity bounds analogous to those for binary classification. Previous results ([brukhim2022characterization], [hanneke2024improved]) established upper bounds of $k > 2$ 3, but no tight characterization was proven.

Main Results

The principal contribution is the proof that the maximum hypergraph density of any multiclass hypothesis class is upper-bounded by its DS dimension. More formally, for every $k > 2$ 4, $k > 2$ 5 holds. The proof leverages a recent algebraic characterization by Hanneke et al. [hanneke2026optimal], applying linear algebraic spanning arguments to monomial function sets representing multiclass hypothesis classes.

This result yields several corollaries:

Optimal Sample Complexity for Multiclass Learning: For realizable distributions, the optimal sample complexity is $k > 2$ 6, matching the previously conjectured lower bound and removing the extraneous $k > 2$ 7 and log factors [brukhim2022characterization, hanneke2024improved].
Agnostic Multiclass Learning: The optimal sample complexity, up to log factors, is $k > 2$ 8, with $k > 2$ 9 the Natarajan dimension [cohen2025sample].
List Learning: For lists of size $\mathcal{H} \subseteq [k]^\mathcal{X}$ 0, the realizable sample complexity is $\mathcal{H} \subseteq [k]^\mathcal{X}$ 1 [charikar2023characterization, hanneke2026optimal], and in the agnostic setting, $\mathcal{H} \subseteq [k]^\mathcal{X}$ 2, closing the gaps in dependence on $\mathcal{H} \subseteq [k]^\mathcal{X}$ 3.

The results extend to infinite label spaces ( $\mathcal{H} \subseteq [k]^\mathcal{X}$ 4), showing robust applicability in general settings.

Numerical and Structural Implications

The analysis confirms the tightness of the sample complexity bound, including the constant. The algebraic arguments provide a constructive proof where earlier combinatorial and inductive approaches failed due to pathologies in multiclass shifting operations.

The removal of the $\mathcal{H} \subseteq [k]^\mathcal{X}$ 5 constraint and log factors for multiclass and list learning establishes a concrete foundation for algorithm design: optimal learning algorithms can now be derived directly in terms of DS and Natarajan dimensions without extraneous combinatorial artifacts.

In multiclass settings, learning theory practitioners can now benchmark algorithms against optimal rates, both in realizable and agnostic regimes. For list learning, the characterization elucidates the extent to which outputting multiple predictions per instance improves sample efficiency, clarifying the role of list size $\mathcal{H} \subseteq [k]^\mathcal{X}$ 6 and the corresponding combinatorial parameters.

Practical and Theoretical Implications

Practically, these results impact multiclass prediction models, boosting, and structured output prediction, especially in applications with high label cardinality or need for uncertainty quantification via lists. The extension to infinite label spaces is relevant to high-dimensional prediction and open-world classification scenarios.

Theoretically, the findings:

Validate the DS dimension as the canonical complexity parameter for multiclass PAC learning, analogous to the VC dimension for binary tasks
Resolve a major conjecture linking density-based combinatorial quantities to algebraic structure
Highlight the power of algebraic techniques over traditional combinatorial tools, especially in settings where invariance under shifting breaks
Tighten the gap between PAC bounds and combinatorial quantities in both classification and list learning

Future directions include pinning down the optimal dependence on list size $\mathcal{H} \subseteq [k]^\mathcal{X}$ 7 (current bounds are tight up to polynomial factors in $\mathcal{H} \subseteq [k]^\mathcal{X}$ 8), and extending algebraic characterization to regression and more general output spaces [pmlr-v272-pabbaraju25a]. Additional research could probe the use of monomial-based spanning properties for new combinatorial dimensions and optimization of sample compression schemes.

Conclusion

The paper solves the open problem of optimal sample complexity for multiclass and list learning, conclusively relating the DS dimension to maximum hypergraph density. The algebraic proof unlocks optimal bounds, provides tight sample complexity characterizations in both realizable and agnostic cases, and eliminates lingering multiplicative gaps. The theoretical and practical consequences are significant, guiding algorithm construction and analysis for multiclass and list learning tasks, and laying foundation for further exploration of algebraic methods in learning theory.

Markdown Report Issue