$K-$means with leraned metrics

Published 15 Mar 2026 in math.ST, cs.LG, and math.PR | (2603.14601v1)

Abstract: We study the Fréchet {\it k-}means of a metric measure space when both the measure and the distance are unknown and have to be estimated. We prove a general result that states that the {\it k-}means are continuous with respect to the measured Gromov-Hausdorff topology. In this situation, we also prove a stability result for the Voronoi clusters they determine. We do not assume uniqueness of the set of {\it k-}means, but when it is unique, the results are stronger. {This framework provides a unified approach to proving consistency for a wide range of metric learning procedures. As concrete applications, we obtain new consistency results for several important estimators that were previously unestablished, even when $k=1$. These include {\it k-}means based on: (i) Isomap and Fermat geodesic distances on manifolds, (ii) difussion distances, (iii) Wasserstein distances computed with respect to learned ground metrics. Finally, we consider applications beyond the statistical inference paradigm like (iv) first passage percolation and (v) discrete approximations of length spaces.}

Abstract PDF Upgrade to Chat

Authors (4)

Summary

The paper establishes a general consistency theorem for the k-means estimator under learned metrics using measured Gromov–Hausdorff convergence.
It rigorously demonstrates the convergence of both cluster centers and Voronoi clusters in diverse metric measure spaces.
Applications span manifold learning, diffusion maps, and Wasserstein spaces, providing robust frameworks for data-dependent clustering.

$k$ -Means with Learned Metrics: A General Consistency Theory

Introduction and Motivation

The paper " $k$ -Means with Learned Metrics" (2603.14601) rigorously investigates the stability and consistency of $k$ -means clustering procedures when the underlying metric is itself learned from data, rather than being a priori fixed. This problem arises naturally in manifold learning, metric learning, and distributional clustering, where the geometric structure of data must be inferred. The authors operate in the general setting of metric measure spaces and provide a unified abstract theory that encompasses numerous practical procedures, such as clustering with Isomap, Fermat, diffusion-based, or Wasserstein learned distances.

Key references are provided throughout, but notably the analysis is grounded in the topology of measured Gromov–Hausdorff (mGH) convergence, a framework from metric geometry that enables comparisons between metric measure spaces where both the underlying space and the measure may vary. The main contribution is a general consistency theorem for the $k$ -means estimator—extending to both cluster centers (barycenters) and clusters themselves—when the metric is empirically estimated or otherwise perturbed. Stronger statements are given when the $k$ -means set is unique. Applications are detailed for a broad range of data-analytic and statistical settings important in modern machine learning.

Main Theoretical Results

General Consistency under mGH Perturbations

The principal result demonstrates that, under mild compactness assumptions, if a sequence of measured metric spaces $(\mathcal{X}_n = (\mathbb{X}_n, d_n, \mu_n))$ converges to a limit $(\mathcal{X} = (\mathbb{X}, d, \mu))$ in the measured Gromov–Hausdorff sense, then the empirical $k$ -means solutions computed with respect to the learned metrics also converge (in Hausdorff distance) to the limiting $k$ -means set. This holds for all $k \geq 1$ and for any power $p \ge 1$ in the Fréchet functional.

Notably, uniqueness of the limiting $k$ -means set is not assumed; in the case of uniqueness, convergence can be strengthened to two-sided Hausdorff convergence of solution sets. This result unifies and generalizes previous partial results that assumed fixed metrics or restricted to $k=1$ [JaffeKmeans, Cuevas]. Further, a stability theorem is established for the associated Voronoi clusters, showing their convergence in the appropriate sense under mGH perturbations.

Technique and Topological Foundations

The proof leverages the continuity properties of the integral-of-powered-distance functional with respect to the Gromov–Hausdorff and weak-probability topologies, as outlined in [JaffeKmeans, Sturm]. The authors establish that any sequence of empirical minimizers, when appropriately transported via $\varepsilon$ -isometries between spaces, will in the limit approximate true minimizers for the population metric measure space. This is a general statement about the functional stability of Fréchet $k$ -means in varying metric structures.

Implications and Applications

Clustering with Empirically Estimated Metrics

The general theorem is specialized to settings where the metric is itself learned or estimated:

Isomap and Fermat Distances: Under standard sampling and scaling assumptions, the empirical shortest-path distances along random geometric graphs (Isomap) or more general power-weighted graphs (Fermat) converge uniformly to underlying manifold-dependent geodesic metrics [groisman2022nonhomogeneous, isomapproofs]. The authors prove strong consistency of $k$ -means computed with these learned distances, addressing an open question in manifold clustering.
Diffusion Distances: For graph-Laplacian-based spectral embeddings (Laplacian eigenmaps, diffusion maps), the convergence of diffusion distances (based on spectral decomposition) to their continuous manifold counterparts is leveraged [belkin2003laplacian, coifman2006diffusion, von2008consistency]. The paper establishes that $k$ -means clustering on the resulting Euclidean or diffusion-induced distances remains universally consistent.
Wasserstein Spaces with Learned Ground Metrics: The paper extends to the case where one learns the ground metric for distributions and then forms Wasserstein distances between measures with respect to those learned metrics. By applying the functoriality of Wasserstein spaces under isometries [villani2008optimal], the authors show consistency of empirical Wasserstein $k$ -means and barycenters, including cases where the underlying pointwise metric is unknown and approximated from sampled data—a setting relevant to distributional and functional data analysis.
Non-Statistical Scenarios (First-Passage Percolation, Discretized Length Spaces): Consistency is shown for barycenter estimation in random metric spaces arising from first-passage percolation [auffinger201750] and for discrete graph approximations of abstract length spaces [Burago]. In both cases, convergence in mGH implies that $k$ -means centers and clusters estimated empirically follow their population analogues.

Uniqueness and Stability

A criterion for uniqueness of the Fréchet mean in the learned metric is provided, via nonpositive curvature (Hadamard) conditions on Fermat-type metrics. This connects the problem of $k$ -means uniqueness and clustering stability to the geometry induced by density-weighted and data-adapted metrics. The analysis relates to classical results for uniqueness and stability of Fréchet means in spaces of nonpositive curvature [sturm2003probability, Bacak2014].

Broader Theoretical Implications

These results firmly situate $k$ -means applied to learned metrics within a robust statistical and geometric framework, closing a gap in the literature regarding consistency guarantees for these widely used, empirically motivated procedures. The generality of the mGH formalism allows for broad applicability, including to situations where the underlying metric, measure, or both are unknown, estimated empirically, or subject to algorithmic or stochastic perturbations.

Numerical and Contradictory Claims

The paper provides strong uniform convergence results for $k$ -means centers and clusters, matching the rate at which metrics converge in mGH distance. This is a universal guarantee, contingent only on the appropriate convergence of sequences of metric measure spaces—not on restrictive or model-specific properties. No contradictory claims are present; the theoretical architecture is rigorous and conservatively stated with respect to known counterexamples in weaker topologies.

Future Directions

Possible extensions include:

Quantitative rates of convergence for $k$ -means under empirical metrics, particularly when metrics are themselves subject to statistical estimation error (e.g., in high-dimensional or sparse regimes).
Extension to non-compact or non-normalized measure settings, possibly requiring additional control on doubling properties or curvature.
Analysis of algorithmic stability and computational tractability for $k$ -means procedures under complex data-dependent metrics.
Investigation of applications to more general clustering functionals (beyond $k$ -means) and further connections to optimal transport–based learning paradigms.

Conclusion

This work provides a comprehensive and theoretically grounded answer to the question of when $k$ -means procedures, implemented with learned or random metrics, remain consistent and stable estimators of population cluster centers and clusters. By locating the problem within the measured Gromov–Hausdorff topological setting, the results simultaneously unify and generalize much of the prior work on clustering in non-Euclidean and data-dependent geometric settings, and provide immediate implications for a wide range of modern data analytic methodologies (2603.14601).

Markdown Report Issue

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Paper Prompts

Top Community Prompts

Explain it Like I'm 14

off on

Knowledge Gaps

off on

Practical Applications

off on

Glossary

off on

Conceptual Simplification

off on

Explain it Like I'm 14

Overview

This paper studies what happens to k‑means clustering when the way you measure distance between points isn’t fixed ahead of time but is learned from data. The authors prove that if your learned distances and your estimate of how the data are distributed keep improving, then the k‑means centers you compute (and the clusters they create) also improve and settle down to the “right” ones. They also show this for several popular learned distances used in practice, like Isomap, Fermat, diffusion, and Wasserstein-based distances.

Think of it like planning where to build k hospitals in a city: you need a map (distances) and you need to know where people live (distribution). If both your map and your population counts get more accurate, the best hospital locations you pick will get closer to the truly best spots. That’s the core idea.

What questions does the paper ask?

If the distance between data points is learned (not fixed), will k‑means still find centers and clusters that reflect the true structure, as we get more and better data?
How sensitive (stable) are the k‑means centers and the resulting Voronoi clusters to small changes in the “world” (the space, its distances, and how points are distributed)?
Can we give general conditions that guarantee consistency (convergence to the truth) for many different learned distances used in machine learning?

How did the authors approach the problem?

To make “distance learning” precise, the authors use the idea of a metric measure space: a set with

a metric (a “ruler” telling you how far apart two points are), and
a measure (a “weight scale” telling you how the data are spread out).

They compare whole worlds (spaces + distances + distributions) using a notion called measured Gromov–Hausdorff (mGH) convergence. In simple terms, two worlds are close if you can line them up with a near-perfect map that:

barely distorts distances (like having almost the same ruler), and
carries the data distribution from one world to nearly match the other (like having almost the same population layout).

Then they analyze the k‑means cost function (the average of the p‑th power of the distance to the nearest center) and prove it behaves continuously under this kind of convergence. That means small changes in the world lead to small changes in the best k centers.

They also study Voronoi clusters—the regions of points closest to each center—and show these regions are stable: if the centers change a little, the clusters change only a little.

What did they find, and why is it important?

The main findings are:

Continuity and stability of k‑means:
- If a sequence of metric measure spaces gets closer and closer to a limit space in the mGH sense, then any set of k‑means centers in the approximating spaces maps to centers that get arbitrarily close (in Hausdorff distance) to some set of k‑means centers in the limit space.
- The associated Voronoi clusters are also stable under this convergence.
- These results hold for any number of clusters k ≥ 1 and any power p ≥ 1 (the usual k‑means uses p = 2).
- If the true best set of k centers is unique, the convergence is even stronger.
New, unified consistency results for learned distances:
- The theorems give a single framework to prove that k‑means is consistent when you plug in learned distances. This leads to new guarantees (even for k = 1) for:
- Isomap geodesic distances on manifolds (graph-based shortest paths),
- Fermat distances (which favor paths through denser regions),
- Diffusion distances (based on random walks on a data graph),
- Wasserstein distances between distributions when the ground metric is learned.
- They also apply beyond standard statistics, e.g.:
- First passage percolation (random travel times on networks),
- Discrete approximations of continuous “length spaces” (e.g., meshes approximating a surface).
Stability of clusters:
- Voronoi regions defined by k‑means centers vary continuously: if centers move a tiny bit, the clusters they induce don’t jump unpredictably.
A uniqueness insight for Fermat distances:
- They give a geometric condition (related to “non‑positive curvature” after reweighting by the data density) that ensures the k = 1 Fréchet mean (the best single center) is unique. This helps diagnose when the data really form a single cluster under the learned metric.

Why this matters:

In modern data analysis, we often replace straight‑line distances with smarter, learned distances that reflect curved shapes (manifolds) and varying densities. This paper shows that plugging those distances into k‑means is theoretically sound: with enough data, your learned‑metric k‑means finds the right answer.
It gives practitioners confidence that state‑of‑the‑art graph‑ and diffusion‑based methods, combined with k‑means, won’t produce unstable or misleading results as sample sizes grow.

Methods in everyday terms

Learned metrics: Instead of using Euclidean distance, we build distances from the data itself: by shortest paths on a graph (Isomap), by penalizing big jumps and favoring dense areas (Fermat), or by how easily a random walk moves between points (diffusion).
mGH convergence: Imagine two slightly different city maps with slightly different populations. If you can overlay one on the other with almost no distortion and the populations line up well, the worlds are “close.” The authors prove that the “best places to put k facilities” (k‑means centers) and the neighborhoods they serve (Voronoi cells) vary smoothly as you go from one world to the other.
Continuity of the k‑means objective: The total “distance-to-nearest-center” cost won’t suddenly jump if the world changes a little. This powers all the stability results.

Implications and impact

Reliability: Clustering with learned distances becomes provably reliable as data grow—reducing the risk of unstable, data‑artifact clusters.
Generality: A single mathematical framework now covers many popular distance-learning pipelines, simplifying proofs and guiding new methods.
Practical guidance: If the best centers are unique, expect especially stable results. The curvature-based criterion helps check uniqueness when using Fermat distances.
Beyond clustering: The same ideas inform other problems that depend on distances and averages in complex spaces, like optimal transport (Wasserstein barycenters), network travel times, and geometry processing on meshes.

In short, the paper shows that if you learn your distances carefully and your approximations improve, k‑means will keep up—and its answers will converge to the true structure of your data.

View Paper Prompt View All Prompts

Knowledge Gaps

Knowledge gaps, limitations, and open questions

Below is a single, concrete list of what remains missing, uncertain, or unexplored in the paper; each item is phrased to guide future research.

Scope beyond compactness:
- Extend continuity/stability results from compact metric measure spaces to non-compact or locally compact settings (e.g., with tightness, growth, or doubling conditions) and clarify exactly which additional assumptions yield mGH continuity of k-means in non-compact cases.
Quantitative (rate) guarantees:
- Derive explicit, finite-sample stability bounds linking the Hausdorff distance between empirical and population k-means (and clusters) to (i) the mGH distance between spaces, (ii) uniform distance errors in learned metrics, and (iii) weak/Wasserstein distances between measures.
Two-sided convergence without uniqueness:
- Identify minimal structural conditions (e.g., separation/margin, strict convexity of the risk functional) under which two-sided Hausdorff convergence of k-means sets holds even when the population solution set has finitely many elements but is not a singleton.
Stability of clusters: metrics and measures:
- Strengthen cluster convergence from one-sided Hausdorff inclusion to two-sided Hausdorff convergence; alternatively, establish convergence in measure (e.g., symmetric difference under μ) and quantify misclustered mass in terms of metric/measure perturbations.
Robustness to ties and Voronoi boundaries:
- Analyze how ties in distance (non-unique nearest centers) and high-measure Voronoi boundaries affect cluster stability results, and specify conditions that exclude or control these pathologies.
Dependence on p and other loss functions:
- Extend continuity/stability from p ≥ 1 to regimes like p → ∞ (k-center) or 0 < p < 1 (non-convex costs), and characterize what changes in assumptions or proof techniques are required.
Practical construction of ε-isometries:
- Provide constructive/algorithmic procedures for building the ε-isometries h_n needed for mGH convergence in common learned-metric settings (Isomap, Fermat, diffusion, Wasserstein-with-learned-ground-metric), including how to verify/estimate ε_n from data.
Learned metric noise and partial observations:
- Establish mGH-based consistency under realistic distance learning noise models (e.g., additive/multiplicative noise on pairwise distances, missing edges/weights in graphs), with explicit conditions that guarantee the ε-isometry and weak convergence requirements.
Parameter selection for graph-based metrics:
- Give data-driven selection strategies (and guarantees) for Isomap/graph parameters (ε or k in k-NN, connectivity conditions, bandwidths) that ensure the uniform metric convergence needed for mGH consistency, including boundaries and variable density.
Manifolds with boundary and non-smooth geometries:
- Extend the Fermat/Isomap consistency results from closed smooth manifolds to manifolds with boundary, corners, or stratified spaces; characterize additional terms at the boundary and the impact on k-means consistency.
Diffusion distances (Diffusion Maps, Laplacian eigenmaps):
- Specify the precise scaling regimes (time t, kernel bandwidth ε, normalization schemes) that guarantee uniform convergence of diffusion distances and hence mGH convergence; include cases with anisotropic kernels, variable bandwidths, and density-normalized operators.
Wasserstein barycenters with learned ground metrics:
- Formalize and prove the mGH assumptions when the ground metric itself is learned from data (e.g., graph or manifold-based distances), and derive stability/consistency of Wasserstein k-barycenters under joint errors in the ground metric and empirical measures.
First passage percolation and discrete length spaces:
- Detail the conditions (and possible rates) under which random/discrete approximations converge in mGH to a limiting length space suitable for k-means stability; clarify dimension/dependence effects and quantify variability.
Uniqueness criteria beyond k = 1 and beyond Fermat:
- Generalize the non-positive curvature (NPC) uniqueness criterion from 1-mean under Fermat metrics to k > 1, and to other learned metrics (e.g., diffusion, Wasserstein-ground-metric) where curvature or convexity structures differ.
Verifiable curvature conditions for Fermat metrics:
- Provide practical, data-driven or density-estimator-based checks for the curvature inequality used to ensure NPC (and uniqueness) in the Fermat metric, including robustness to density estimation error.
Statistical inference beyond consistency:
- Develop asymptotic distributions, confidence sets, and hypothesis tests (e.g., for uniqueness of k-means, selection of k) in the learned-metric setting, extending Pollard/Cuevas-type results to account for metric estimation uncertainty.
Non-i.i.d. and dependent sampling:
- Extend results to dependent designs (e.g., Markov chains on manifolds, network or spatial sampling), nonstationary densities, or mixture models; specify conditions ensuring mGH convergence and k-means stability under dependence.
Multiple components and disconnected supports:
- Analyze settings where μ has multiple disconnected components (or is a mixture on multiple manifolds), providing separation conditions that guarantee correct center placement and cluster recovery under learned metrics.
Algorithmic and computational aspects:
- Address computation of k-means and Voronoi partitions under general learned metrics (e.g., complexity, approximation algorithms, initialization/stability), and study sensitivity to local minima in nonconvex metric spaces.
Medoids vs. means and center location constraints:
- Clarify consistency differences between k-means (centers anywhere in the space) and k-medoids (centers restricted to data), especially under learned metrics and finite-sample graphs, and identify when each is preferable.
Uniformity over hyperparameters:
- Establish uniform-in-parameter (e.g., α in Fermat, graph ε/k, diffusion time t) versions of the consistency results, to justify model selection or cross-validation without invalidating asymptotics.
High-dimensional and sample complexity issues:
- Quantify how manifold dimension ℓ and parameters (e.g., α in Fermat) affect convergence rates and sample complexity; provide guidance on parameter choices to mitigate the curse of dimensionality.
Handling outliers and off-manifold noise:
- Introduce robust variants or assumptions ensuring mGH convergence and k-means stability in the presence of outliers or noise off the underlying manifold, and derive corresponding robustness guarantees.

View Paper Prompt View All Prompts

Practical Applications

Overview

This paper proves that k-means centers and their Voronoi clusters remain stable and consistent when the underlying distance is learned (rather than fixed), provided the sequence of metric–measure spaces converges in measured Gromov–Hausdorff (mGH) topology. It delivers general theorems and concrete consistency guarantees for widely used learned distances (Isomap geodesic, Fermat, diffusion, and Wasserstein with learned ground metrics), and for settings beyond statistics (first-passage percolation, discrete length-space approximations). Below are actionable, real-world applications derived from these findings.

Immediate Applications

The following applications can be deployed now with available algorithms (Isomap, Diffusion Maps, shortest-path on graphs, optimal transport solvers) and the paper’s guarantees provide theoretical validation of their consistency and stability.

Robust clustering with learned metrics in high-dimensional data
- Sectors: software/ML platforms, healthcare, finance, e-commerce, cybersecurity
- Use case: Replace Euclidean k-means with k-means on learned distances (e.g., Isomap or diffusion distances on similarity graphs; Fermat distances for density-aware clustering). Improves cluster quality for non-linear manifolds or heterogeneous densities.
- Tools/workflows: Build a kNN/ε-graph, compute shortest-path (Isomap) or diffusion distances (e.g., via graph Laplacians/Diffusion Maps), then run k-means in that metric. For Fermat distances (α>1), compute power-weighted shortest paths on the complete or pruned graph.
- Assumptions/dependencies: Compactness or effective boundedness of the data manifold; enough sample coverage so graph distances converge uniformly; correct choice of graph hyperparameters (ε, k); stable numerical solvers for shortest paths and spectral decompositions.
Patient stratification with graph/diffusion or density-aware distances
- Sectors: healthcare, public health
- Use case: Build patient similarity graphs from EHRs; use diffusion or Fermat distances to cluster patients into phenotypes more faithfully than Euclidean space. Stability guarantees support clinical interpretability and reproducibility.
- Tools/workflows: Feature construction → similarity graph → diffusion/Fermat distances → k-means → Voronoi-based cohort assignment.
- Assumptions/dependencies: Informative features and graph construction; sufficient density for uniform distance consistency; governance around data quality and privacy.
Distributional data clustering using Wasserstein barycenters with learned ground metrics
- Sectors: retail/marketing (customer purchase profiles), energy (load shapes), finance (return distributions), climate (spatial/temporal histograms), NLP (topic/embedding distributions)
- Use case: Cluster histograms or distributions via Wasserstein distances where the ground metric is itself learned (e.g., word embeddings, learned time-axis distances). The paper’s results ensure k-barycenter consistency under learned ground metrics.
- Tools/workflows: Learn ground metric (e.g., word embeddings, time-of-day embeddings), compute pairwise Wasserstein distances or barycenters (Sinkhorn/OT solvers), run k-means in Wasserstein space.
- Assumptions/dependencies: Convergence of the learned ground metric; adequate regularization and scaling for OT; compactness or effective bounds on supports.
Stable spatial/graph partitioning and segmentation
- Sectors: geospatial analytics, logistics, telecom, robotics
- Use case: Use Voronoi clusters induced by learned metrics (e.g., road-network geodesic or travel-time distances) to partition service regions or coverage zones; stability of Voronoi cells ensures reliable assignments as the metric is refined.
- Tools/workflows: Build network-aware distances (shortest-paths), run k-means, compute Voronoi cells for assignments; monitor cluster stability as data/metric updates occur.
- Assumptions/dependencies: Reasonable graph connectivity; consistent metric updates that satisfy mGH-type convergence; bounded domain.
Clustering point-clouds on surfaces with convergence guarantees
- Sectors: 3D scanning, manufacturing, AR/VR, digital twins
- Use case: Cluster point-clouds from surfaces using Isomap/Fermat distances; the paper’s results ensure that discrete clusters converge to their continuous counterparts as sampling densifies.
- Tools/workflows: Surface sampling → graph construction → geodesic/density-aware distances → k-means → surface segmentation.
- Assumptions/dependencies: Smooth/closed manifold (or well-approximated compact geometry), uniform sampling and graph connectivity for consistent distances.
Model selection via uniqueness/stability checks for k
- Sectors: software/ML platforms, healthcare, finance
- Use case: Use uniqueness of the set of k-means (when present) and stability of Voronoi cells as proxies for choosing k. The paper shows stronger convergence when the population k-means is unique; provides a curvature-based uniqueness criterion for the Fermat 1-mean.
- Tools/workflows: Run k-means across k; monitor variability across runs/bootstraps; for k=1 under Fermat distances, test log-concavity or curvature proxies when feasible.
- Assumptions/dependencies: Uniqueness often depends on geometry and density; practical uniqueness tests may be heuristic; Fermat uniqueness criterion requires density/curvature conditions that may be approximated in practice.
Curriculum modules and reproducible labs on stable clustering with learned metrics
- Sectors: education (data science, geometry processing)
- Use case: Teach robust clustering and manifold learning by pairing distance-learning methods with k-means under proved stability; contrast with Euclidean k-means failures on non-linear manifolds.
- Tools/workflows: Open-source notebooks using scikit-learn, POT (Python Optimal Transport), and graph libraries.
- Assumptions/dependencies: Access to standard ML libraries; curated datasets.

Long-Term Applications

These applications require further research, scaling, or development (e.g., rate bounds, non-compact extensions, large-scale engineering).

Streaming and time-evolving learned metrics with continuous cluster tracking
- Sectors: finance (market regimes), cybersecurity (evolving threat graphs), IoT/telemetry, operations
- Use case: As metrics evolve (new edges, updated weights, embeddings), maintain clusters with stability guarantees; track change points via deviations in Voronoi cells.
- Tools/products: Incremental graph/diffusion distance updates, online shortest-path maintenance, streaming k-means in learned metric spaces, dashboarding for stability metrics.
- Dependencies: Extensions of mGH results to time-varying/non-compact settings; incremental algorithms with strict latency budgets; empirical rate guarantees.
Deep metric learning + k-means with theoretical validation
- Sectors: computer vision, speech, recommendation systems, biometrics
- Use case: Apply k-means on distances induced by deep embeddings (Siamese/triplet networks), seeking conditions under which embedding families satisfy mGH-like convergence to justify stability.
- Tools/products: Training pipelines that export calibrated distances; cluster-stability monitors; certification reports for regulated industries.
- Dependencies: Theory linking neural embeddings to mGH convergence; robustness to distribution shift; scalable distance computations for very large datasets.
Infrastructure and network resilience via percolation-aware clustering
- Sectors: telecom, transportation, utilities, emergency response
- Use case: Use first-passage percolation (FPP) metrics on stochastic networks to locate stable “centers” of service or resilience hubs; track how clusters change under random failures.
- Tools/products: FPP simulators, stochastic shortest-path solvers, resilient k-center/k-medoids planners.
- Dependencies: Efficient large-scale FPP simulation; validation that discretized/empirical FPP metrics satisfy mGH convergence; integration with operational constraints.
City planning and fair resource allocation using learned geodesic distances
- Sectors: public policy, urban planning, social services
- Use case: Define service zones using road-network geodesic or public-transit-based distances; apply cluster stability to assess robustness under data revisions and seasonal changes.
- Tools/products: Geospatial pipelines (OSM-based networks), k-means with network distances, stability/robustness scorecards for policy justification.
- Dependencies: Handling non-compact domains and boundary effects; aligning stability metrics with fairness and legal mandates.
Large-scale Wasserstein clustering and barycenters with learned ground metrics
- Sectors: energy (millions of smart meters), retail (massive customer bases), climate (large grids)
- Use case: Cluster massive sets of distributions with OT, where ground metrics are learned from context (e.g., grid topology, temporal embeddings), and rely on consistency under metric learning.
- Tools/products: GPU-accelerated Sinkhorn, multi-scale OT, landmark-based approximations; monitoring of ground-metric convergence.
- Dependencies: Scalability of OT at industrial scale; convergence diagnostics for learned ground metrics; rate-of-convergence analyses.
Robotics and autonomous systems: stable partitioning for coverage, exploration, and routing
- Sectors: robotics, autonomous vehicles, drones
- Use case: Use learned geodesic/diffusion distances from SLAM maps or occupancy grids to partition environments stably (Voronoi cells), improving task allocation and coverage consistency.
- Tools/products: Graph SLAM integration, metric-aware k-means, Voronoi-based task dispatch, stability-aware replanning.
- Dependencies: Real-time distance estimation; guarantees in non-compact, dynamic scenes; computational budgets on embedded hardware.
Rates, uncertainty quantification, and finite-sample guarantees
- Sectors: regulated industries (healthcare, finance), safety-critical applications (autonomy)
- Use case: Provide finite-sample error bars on centers and clusters under learned metrics; develop audits for when data violate assumptions (e.g., insufficient coverage for uniform convergence).
- Tools/products: Statistical tests for uniform distance convergence, mGH convergence diagnostics, cluster stability confidence intervals.
- Dependencies: New theory for rates under different learned metrics; practical diagnostics and acceptance thresholds.
Tooling: “Certified clustering” libraries
- Sectors: ML tooling, MLOps
- Use case: Build libraries that implement learned metrics (Isomap, Fermat, diffusion, Wasserstein), run k-means, and emit stability/uniqueness diagnostics and certificates.
- Tools/products: scikit-learn-compatible API; OT integrators (e.g., POT), graph libraries; dashboards and reports for audits.
- Dependencies: Standardized benchmarks for stability; interfaces for custom learned ground metrics; community best practices.

Notes on Key Assumptions and Dependencies

Compactness and coverage: Most guarantees assume compact spaces or effectively bounded domains and that samples cover the support well enough (e.g., connectivity and uniform convergence of graph-based distances).
Distance consistency: For Isomap, need ε-graph or kNN-graph with ε→0 and nε^d/log n→∞; for Fermat (α>1), require uniform consistency of power-weighted shortest paths on manifolds.
Metric convergence: For Wasserstein with learned ground metrics, guarantees hinge on the convergence of the ground metric itself.
Uniqueness and stability: Strongest statements hold when the population set of k-means is unique; otherwise convergence is one-sided (no-false-positives). The paper provides a curvature-based criterion ensuring uniqueness for the Fermat 1-mean under certain density/geometry conditions.
Computational scaling: Graph shortest paths, spectral decompositions, and OT can be costly; practical deployments may need sparsification, landmarks, multiscale approximations, and GPU acceleration.
Non-compact/edge cases: Extensions to non-compact spaces or with heavy tails require care; stability near boundaries may degrade without appropriate handling.

View Paper Prompt View All Prompts

Glossary

Banach space: A complete normed vector space where every Cauchy sequence converges in the norm. "separable Banach spaces."
Barycenter (k-barycenters): A generalization of the mean to metric spaces; k-barycenters are k representative points minimizing average (power) distances. "Wasserstein k-barycenters"
Borel measure: A measure defined on the Borel σ-algebra (generated by open sets) of a topological space. "Borel measures defined on separable metric spaces."
Conformal change of metric: A transformation of a Riemannian metric by a positive scalar function that rescales lengths locally. "the conformal change of metric $\tilde{g} = f^\kappa\, g$ "
Diffusion distances: Data-driven distances derived from diffusion processes (random walks) on graphs that capture geometry and density. "estimate diffusion distances"
Diffusion Maps: A nonlinear dimensionality reduction method that builds diffusion processes on data to capture geometry/density. "Diffusion Maps"
Doubling constant: A constant bounding how many balls of half the radius are needed to cover a metric ball; controls metric space regularity. "uniform bounds for the doubling constants and the diameters"
Empirical measure: The uniform probability measure on a finite sample, assigning mass 1/n to each data point. "the empirical measure $\mu_n$ "
ε-graph: A graph on data points where edges connect pairs within Euclidean distance ε; used to approximate geodesics. "the $\varepsilon$ -graph"
ε-isometry: A map between metric spaces that approximately preserves distances and is almost surjective up to ε. "is called an $\varepsilon$ -isometry"
Fermat distances: Power-weighted path lengths on data graphs that adapt to density, generalizing geodesic distance; controlled by parameter α>1. "Fermat distances"
First passage percolation: A random metric model assigning random weights to edges and using shortest paths with those weights. "first passage percolation"
Geodesic distance: The intrinsic shortest-path distance on a manifold or metric space. "geodesic distances"
Gromov-Hausdorff distance: A metric comparing compact metric spaces up to isometry by how closely they embed into a common space. "in the Gromov-Hausdorff distance"
Gromov-weak convergence: A mode of convergence for metric measure spaces based on convergence of finite metric measure distributions. "Gromov-weak sense"
Hadamard manifold: A complete, simply connected Riemannian manifold with non-positive sectional curvature. "the Fermat manifold is Hadamard"
Hausdorff distance: The distance between two sets defined as the maximum distance of a point in one set to the other set. "Hausdorff distance"
Hausdorff topology: The topology induced by the Hausdorff distance on sets, often used for convergence of compact subsets. "Hausdorff topology"
Heine-Borel property: The property that closed and bounded subsets are compact; holds in Euclidean spaces and some metric spaces. "Heine-Borel property."
Hessian Eigenmaps: A manifold learning method using Hessian-based operators to recover parametrizations from high-dimensional data. "Hessian Eigenmaps"
Isomap: A manifold learning algorithm that estimates geodesic distances via graph shortest paths and applies MDS. "Isomap"
k-medoids: A clustering method like k-means but centers are restricted to data points; robust to outliers. "{\it k-}medoids"
k-nearest neighbors graph (k-NN graph): A graph connecting each point to its k nearest points, used for manifold approximation. "the $k$ -nearest neighbors graph ( $k$ -NN)"
Kuratowski outer limit: A set-convergence notion capturing all limit points of sequences of sets. "Kuratowski outer limit"
Laplacian eigenmaps: A spectral method using the graph Laplacian’s eigenvectors to embed data while preserving local geometry. "Laplacian eigenmaps"
Length space: A metric space where distances equal infima of curve lengths, i.e., geodesic-type spaces. "length spaces"
Measured Gromov-Hausdorff (mGH) convergence: Convergence of metric measure spaces via ε-isometries and weak convergence of pushforward measures. "measured Gromov-Hausdorff (mGH) sense"
Non-positive curvature: A curvature condition where geodesic triangles are thinner than Euclidean; implies unique barycenters. "non-positive curvature"
Plug-in estimator: An estimator that replaces population quantities with empirical counterparts in a functional. "plug-in estimator"
Pushforward measure: The measure obtained by mapping a measure through a function, denoted f#μ. " $(h_n)_{\#}(\mu_n) \to \mu$ "
Riemannian manifold: A smooth manifold with an inner product on tangent spaces, defining notions like length and curvature. "Riemannian manifold"
Sectional curvature: Curvature associated with two-dimensional planes in the tangent space of a Riemannian manifold. "sectional curvatures"
Separable metric space: A metric space containing a countable dense subset. "separable metric spaces"
Similarity graph: A weighted graph built on data where edges encode pairwise similarities, used for diffusion processes. "similarity graph"
Sturm’s D-distance: A distance on metric measure spaces inducing a topology (often equivalent to mGH under compactness/regularity). "Sturm's $\mathbf D-$ distance"
Sub-Gaussian: A tail behavior stronger than Gaussian concentration, implying exponential decay of deviations. "sub-Gaussian"
Voronoi cells: Regions partitioning a space where each point is assigned to its nearest center. "Voronoi cells"
Wasserstein barycenters: Fréchet means in the Wasserstein space of probability measures. "Wasserstein k-barycenters"
Wasserstein distance: An optimal transport distance between probability measures based on moving mass with minimal cost. "Wasserstein distances"
Wasserstein topology: The topology on probability measures induced by Wasserstein distances (e.g., Wp). "Wasserstein topology"
Weak convergence: Convergence of probability measures via convergence of integrals against bounded continuous functions. "converging weakly"
Within-cluster sum of squares: The k-means objective summing squared distances of points to their nearest center. "within cluster sum of squares."

View Paper Prompt View All Prompts

Open Problems

We're still in the process of identifying open problems mentioned in this paper. Please check back in a few minutes.

$K-$means with leraned metrics

Summary

$k$ -Means with Learned Metrics: A General Consistency Theory

Introduction and Motivation

Main Theoretical Results

General Consistency under mGH Perturbations

Technique and Topological Foundations

Implications and Applications

Clustering with Empirically Estimated Metrics

Uniqueness and Stability

Broader Theoretical Implications

Numerical and Contradictory Claims

Future Directions

Conclusion

Paper to Video (Beta)

Whiteboard

Paper Prompts

Top Community Prompts

Explain it Like I'm 14

Overview

What questions does the paper ask?

How did the authors approach the problem?

What did they find, and why is it important?

Methods in everyday terms

Implications and impact

Knowledge Gaps

Knowledge gaps, limitations, and open questions

Practical Applications

Overview

Immediate Applications

Long-Term Applications

Notes on Key Assumptions and Dependencies

Glossary

Open Problems

Continue Learning

Collections

Tweets

Don't miss out on important new AI/ML research

$K-$means with leraned metrics

Summary

kkk-Means with Learned Metrics: A General Consistency Theory

Introduction and Motivation

Main Theoretical Results

General Consistency under mGH Perturbations

Technique and Topological Foundations

Implications and Applications

Clustering with Empirically Estimated Metrics

Uniqueness and Stability

Broader Theoretical Implications

Numerical and Contradictory Claims

Future Directions

Conclusion

Paper to Video (Beta)

Whiteboard

Paper Prompts

Top Community Prompts

Explain it Like I'm 14

Overview

What questions does the paper ask?

How did the authors approach the problem?

What did they find, and why is it important?

Methods in everyday terms

Implications and impact

Knowledge Gaps

Knowledge gaps, limitations, and open questions

Practical Applications

Overview

Immediate Applications

Long-Term Applications

Notes on Key Assumptions and Dependencies

Glossary

Open Problems

Continue Learning

Collections

Tweets

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research

$k$ -Means with Learned Metrics: A General Consistency Theory