Filtered Approximate Nearest Neighbor Search Cost Estimation

Published 6 Feb 2026 in cs.DB | (2602.06721v1)

Abstract: Hybrid queries combining high-dimensional vector similarity with structured attribute filtering have garnered significant attention across both academia and industry. A critical instance of this paradigm is filtered Approximate k Nearest Neighbor (AKNN) search, where embeddings (e.g., image or text) are queried alongside constraints such as labels or numerical range. While essential for rich retrieval, optimizing these queries remains challenging due to the highly variable search cost induced by combined filters. In this paper, we propose a novel cost estimation framework, E2E, for filtered AKNN search and demonstrate its utility in downstream optimization tasks, specifically early termination. Unlike existing approaches, our model explicitly captures the correlation between the query vector distribution and attribute-value selectivity, yielding significantly higher estimation accuracy. By leveraging these estimates to refine search termination conditions, we achieve substantial performance gains. Experimental results on real-world datasets demonstrate that our approach improves retrieval efficiency by 2x-3x over state-of-the-art baselines while maintaining high search accuracy.

Abstract PDF Upgrade to Chat

Authors (4)

Summary

The paper introduces the EE framework that leverages early probing to estimate and manage search cost in filtered AKNN queries.
It employs a LightGBM regressor with filter-aware and queue features to predict query budgets, achieving 2–3x latency reductions.
The approach is index-agnostic, generalizing across multiple indices and demonstrating robust performance on real-world datasets.

Filtered Approximate Nearest Neighbor Search Cost Estimation: An Expert Review

Problem Statement and Motivation

Hybrid queries leveraging high-dimensional vector similarities combined with structured attribute filtering are increasingly integral to retrieval systems, especially in domains such as retrieval-augmented generation (RAG), multi-modal search, and recommendation engines. The filtered Approximate $k$ Nearest Neighbor (AKNN) search problem is foundational: given a query embedding plus a filter (e.g., labels or value ranges), efficiently retrieve the top- $k$ vectors meeting both the similarity criterion and filter constraints.

However, real-world deployments confront substantial variability in search cost, as filter selectivity interacts non-trivially with vector neighborhoods, often resulting in significant feature-filter misalignment. Traditional approaches employ static global budgets for search—wasting resources on easy queries and causing latency spikes for difficult ones.

Figure 1: An example of filtered AKNN search in an e-commerce scenario with a filter constraint, demonstrating how filter-aware early termination avoids unnecessary expansion.

Figure 2: A motivational example in e-commerce, showing that hard range filters can exclude proximal neighbors, necessitating a much larger search radius to find valid candidates.

The paper rigorously problematizes cost estimation under these conditions. Critical is the inability of existing filter-agnostic estimators to capture local distributional misalignment, leading to inaccurate predictions and inefficient query execution. The authors formally define metrics of global selectivity ( $\sigma_{global}$ ) and local correlation ( $\rho_{local}$ ), demonstrating the pronounced divergence in real datasets.

Figure 3: Empirical demonstration of the misalignment between local and global selectivity on practical datasets, where static estimators fail to model search hardness.

The $\EE$ Framework and Core Innovations

The authors introduce the Early probe-to-Early termination ($\EE$) framework specifically tailored for filtered AKNN cost estimation and adaptive search optimization. The architecture rests on three pillars:

Early Probe Phase: The search process begins with a seamless, zero-overhead probe, collecting runtime signals such as local valid ratio ( $\rho_{pilot}$ ) and filter-vector density. These features characterize misalignment without incurring additional computational burden.
Cost Estimation via GBDT: By aggregating filter-aware and established queue trajectory features, the framework employs a light-weight Gradient Boosting Decision Tree (LightGBM) regressor to predict the total search budget $W_q$ . Notably, the model targets logarithmic error to mitigate heavy-tailed distributions and optimize for relative error.
Dynamic Search Termination: Using model predictions, the AKNN traversal dynamically adapts its termination boundary, returning results early for easy queries and allocating higher budgets for difficult, low-selectivity cases.
Figure 4: The overall architecture of the proposed EE framework, delineating early probing, cost estimation, and adaptive termination.

The $\EE$ approach is index-agnostic, generalizing across post-filter ($\POST$) and pre-filter ($\PRE$) strategies and accommodating graph-based AKNN indices such as HNSW and ACORN.

Technical Contributions and Implementation Details

Feature Engineering and Model Training

The model leverages a comprehensive feature set:

Filter Features: $\rho_{pilot}$ (validity ratio during probe), $\rho_{queue}$ (validity ratio in candidate queue), directly capturing misalignment.
Queue/Result Features: Distance statistics (mean, variance, percentiles), queue size, and normalized ratios.
Global Features: Distance from entry node, search hops.

The regressor is trained with samples generated from pilot searches and labeled by ground truth NDCs (Number of Distance Computations); the target is log-transformed to address heavy-tailed cost distributions.

Adaptive Termination Algorithm

The early probe phase is followed by model invocation; the traversal continues until predicted budget is depleted. This method is orthogonal to the underlying index, only requiring traversal features, and thus is compatible with state-of-the-art indices (ACORN, HNSW, etc.).

Empirical Evaluation

The framework is evaluated on several real-world datasets (Tripclick, Youtube, Arxiv, MSMARCO), with both label and range filters. Key metrics include recall@ $k$ , latency, and NDC.

Strong empirical findings include:

$\EE$ delivers 2–3x reduction in query latency and distance computations compared to filter-agnostic baselines (Naive HNSW, LAET, DARTH) at comparable recall.
The approach robustly adapts to hard queries (low local selectivity), outperforming static and stepwise predictors.
Estimator ranking accuracy (Spearman's $\rho$ up to 0.79) confirms practical efficacy for adaptive budget allocation, even when regression $R^2$ is moderate (reflecting stochastic variance in filtered traversal).

Feature Importance and Generalization

Feature importance analysis (per-tree gain) demonstrates the dominance of proposed filter-aware features ( $\rho_{pilot}$ , $\rho_{queue}$ ), accounting for the majority of prediction gain, especially on datasets with strong attribute-proximity correlations.

Figure 5: Feature importance breakdown for $K=10$ , showing the contribution of filter-aware metrics over legacy features.

Adaptation experiments to ACORN ($\PRE$ index) validate index-agnostic applicability, with $\EE$ maintaining or improving search efficiency after integrating visited filter ratio ( $\rho_{visited}$ ).

Practical and Theoretical Implications

Practically, $\EE$ enables real-time, per-query optimized AKNN retrieval in large-scale, mixed-modal and hybrid filter settings, supporting sub-millisecond latency and robust accuracy maintenance. It eliminates the need for conservative global parameter tuning and supports efficient query optimization in vector database systems, especially in the context of modern LLMs, retrieval-augmented architectures, and enterprise search.

Theoretically, the framework foregrounds the necessity of explicit modeling of feature-filter alignment, moving beyond independence assumptions, and introduces a paradigm for low-cost runtime signal extraction feeding into machine-learned estimators. Future developments could entail extensions to joint multi-attribute filters, hardware-adaptive cost modeling, and integration with advanced vector search indices.

Conclusion

The research presents a formal, index-agnostic framework ($\EE$) for cost estimation and adaptive early termination in filtered AKNN search, explicitly addressing feature-filter misalignment and offering substantial operational efficiency gains in real-world workloads (2602.06721). Empirical evidence confirms superiority across diverse benchmarks. The methodology is extensible to emerging filter search paradigms and is poised to influence optimization strategies in next-generation retrieval systems.

Markdown Report Issue

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Paper Prompts

Top Community Prompts

Explain it Like I'm 14

off on

Knowledge Gaps

off on

Practical Applications

off on

Glossary

off on

Conceptual Simplification

off on

Explain it Like I'm 14

What is this paper about?

Imagine you’re searching for similar items using an image or a piece of text, like finding shoes that look like a photo you upload. Often, you also add filters, like “brand = A” or “price under $50.” Finding the closest matches that also pass your filters is much harder and can be slow.

This paper builds a smart way to predict how much work a filtered search will take and then stop the search early once it’s “good enough.” The trick is to notice how the filter and the query are related around the query, not just across the whole dataset, so the system can avoid wasting time.

What questions does it try to answer?

How can we estimate, quickly and accurately, how long a filtered search will take for a specific query?
How do the query’s nearby items (by similarity) line up with the filter (like brand or price)? Are nearby items likely to pass the filter or not?
Can we use this estimate to stop the search early and still get high-quality results?

How did the researchers approach the problem?

Think of all items (images, texts, etc.) as points in a big space, where “distance” means how different they are. Searching similar items often uses a graph: each item is a “node” connected to nearby items, like roads connecting nearby houses in a city.

The paper focuses on “filtered approximate k-nearest neighbor” (filtered AKNN) search:

“Nearest neighbor” means find the closest items to your query.
“Approximate” means it’s fast but may miss a few perfect matches.
“Filtered” means results must also pass conditions like labels (brand, category) or numeric ranges (price, date).

Key idea: Early probing

Before committing to a full search, the system takes a short “peek” nearby:

It walks a few steps in the graph from the starting point, measuring how many nearby items actually pass the filter.
This quick peek is reused in the full search, so it doesn’t waste work.

Analogy: If you’re looking for a specific brand in a mall, you first glance at the nearby stores. If none sell that brand, you know you’ll need to walk farther; if many do, you can stop sooner.

What features do they look at?

From this early peek, the system collects simple signals that tell how hard the search will be:

Local valid ratio: Of the nearby items you checked, what fraction matched the filter? If this ratio is high, you likely need less searching.
Queue valid ratio: Among the candidates you’re about to explore next, how many look promising and match the filter?

They also look at distance patterns (how close current candidates are), similar to earlier research, but the new twist is tying these signals to the filter.

The prediction model

They use a fast, lightweight machine learning model (LightGBM, a type of boosted decision trees) to predict the “search budget,” which is the amount of work needed. Here, “work” is measured as the number of distance calculations — that’s basically how many comparisons the search makes.

LightGBM is chosen because it’s quick to run (about 0.025 milliseconds per prediction), so it won’t slow the search down.

Using the prediction to stop early

With the predicted budget, the search continues and counts how many comparisons it makes. Once it hits the predicted limit, it stops and returns the current best filtered results. This “adaptive termination” cuts off wasted effort on easy queries and avoids spending too little on hard ones.

What did they find?

Their estimator captures the real difficulty of filtered searches better than existing methods because it looks at the local match between the query and the filter, not just global averages.
Using the estimates to stop early, they achieved about 2x–3x faster search while keeping high accuracy (good recall).
It’s especially strong on “hard” queries where the filter clashes with the query’s nearby items. Example: a photo of a luxury watch plus a filter “price < $20.” Nearby items might be similar watches but not in that price range; the system correctly predicts this will take longer and plans accordingly.
It reduces latency spikes (random slow queries) and makes search times more predictable.

Why does it matter?

Faster searches: Helpful for e-commerce (“find similar products with brand/price filters”), recommendations (music, videos), and AI systems that look up related documents (RAG for LLMs).
Better resource use: Fewer wasted computations means lower costs and energy use.
Smoother experiences: More stable and predictable response times for users.
Practical and general: Works with common graph-based indexes and typical filters (labels and ranges), and the quick estimation is fast enough for real-time systems.

Overall, the paper shows that a small, smart “peek” plus a fast prediction can make filtered similarity search both efficient and reliable, without sacrificing quality.

View Paper Prompt View All Prompts

Knowledge Gaps

Knowledge gaps, limitations, and open questions

Below is a single, concrete list of what remains missing, uncertain, or unexplored in the paper. Each point is framed to guide actionable future research.

Generalization beyond graph-based ANN: How to adapt early-probe features and the estimator to non-graph backends (e.g., IVF/IVF-PQ, ScaNN, PQ/HNSW hybrids, DiskANN) where traversal dynamics and cost drivers differ.
Disk-based and memory-hierarchy costs: The model predicts number of distance computations (NDC) but ignores I/O, cache, SIMD/vectorization, and memory access patterns that dominate latency in disk or compressed indexes; methods to incorporate these costs remain open.
PRE-filter and hybrid filtering support: The framework targets POST filtering; it is unclear how to handle PRE filtering (risking graph disconnections) or PRE/POST hybrids and how early-probe signals should be modified in these regimes.
Unsatisfiable filters and zero-result detection: The method does not describe how to detect contradictory filters that yield |S(q)| = 0 (or < k) early, nor how to terminate gracefully without wasting budget.
Extreme sparsity edge cases: When early probes see no valid nodes (ρ_pilot = 0 and ρ_queue = 0), what robust fallback strategies, conservative budgets, or escalation policies prevent severe underestimation?
Sensitivity to early-probe length f: There is no analysis of how to set or adapt f per query (e.g., based on online signals), the trade-off between probe cost and estimator accuracy, or how mis-specified f affects recall/latency.
Uncertainty-aware termination: The estimator returns a point prediction; there is no confidence/uncertainty modeling (e.g., quantile regression, prediction intervals) to protect recall by adjusting budgets under uncertainty.
Recall guarantees and failure modes: Early termination uses a budget threshold without theoretical recall guarantees; a formal characterization of when/why recall can drop and protective mechanisms (e.g., adaptive backoff) is missing.
Scaling to large k and varying recall targets: The evaluation focuses on Recall@10; it remains unknown how feature distributions and budgets scale for k ≫ 10 or for different target recall (e.g., 95% vs 99.9%).
Multi-attribute and complex predicates: Only single label/range filters are evaluated; behavior under conjunctions/disjunctions, correlated attributes, and heterogeneous predicates (e.g., geo + category + range) is untested, and feature design for joint selectivity remains open.
Model transferability and domain shift: The estimator is trained per dataset/index; open questions include how well it transfers across datasets, embeddings, index configurations, or filter distributions, and how to detect/mitigate drift.
Online/continual learning: No mechanism is provided for updating the model under data/index updates, changing query workloads, or new filter values; criteria and pipelines for safe incremental retraining are needed.
Ground-truth target generation: Training uses “100% recall” via extended search, but the paper does not clarify if exact neighbors (via brute force) are used; biases from approximate ground truth and their impact on training are unresolved.
Robustness to index topology: The effect of graph structure (e.g., degree distribution, connectivity, hierarchical layers in HNSW) on feature reliability and prediction accuracy is not analyzed; topology-aware features may be needed.
Interaction with index parameters: How per-query budgets interplay with HNSW’s efSearch, beam width, or other tunables is not studied; joint optimization of index parameters and learned budgets remains open.
Feature sufficiency and alternatives: The two filter-aware features (ρ_pilot, ρ_queue) may be insufficient for complex correlations; exploring richer signals (e.g., neighborhood degree features, attribute-conditioned neighbor stats, graph embeddings) is an open avenue.
Feature extraction overheads: Although deemed “negligible,” the actual costs of computing features (queue scans, valid-counting, predicate checks) under different workloads and filter complexities are not quantified.
API/integration constraints: Many production libraries do not expose internal queues or visited sets; engineering patterns for integrating the estimator without modifying low-level search loops are not addressed.
Dynamic distance metrics and embedding shifts: Generality to cosine vs. L2 vs. learned metrics and to evolving embedding models (e.g., refreshed encoders) is not evaluated; how to maintain accuracy across metric/model changes is unclear.
Handling highly selective vs. highly unselective filters: The estimator’s calibration at the extremes of selectivity (very high or very low) is not characterized; techniques for per-regime calibration or regime detection are needed.
Coupling with other accelerations: The interaction between this termination policy and other optimizations (e.g., re-ranking, product quantization asymmetry, pruning heuristics, label-aware shortcuts) has not been explored.
NDC-to-latency linearity assumption: The mapping from NDC to wall-clock latency is assumed linear; quantifying when this breaks (e.g., due to branch misprediction, cache misses, SIMD saturation) and how to correct it is an open issue.
Billion-scale and multi-threaded/GPU settings: Experiments are single-threaded and ≤ 1.7M points; behavior at billion-scale, with multi-threading or GPUs, and under shared-resource contention is untested.
Cold-start for rare or unseen filters: How the model performs for rare labels/ranges or entirely new filter values unseen in training is unknown; strategies for zero-shot or few-shot adaptation are needed.
Fairness and bias implications: Exploiting correlations between vectors and attributes may amplify unwanted biases; there is no discussion of fairness-aware cost estimation or safeguards.

View Paper Prompt View All Prompts

Practical Applications

Immediate Applications

Below are deployable use cases that can be implemented with existing vector databases and ANN graph indexes by adding the proposed cost estimator and early-termination logic.

Sector: Software/Vector Databases
- What: Correlation-aware early termination for filtered HNSW/graph ANN
- Tools/Workflows: Integrate the E2E predictor into HNSW (efSearch tuner), Milvus/OpenSearch/Vespa/Faiss wrappers; expose per-query predicted budget; add telemetry for ρ_pilot and ρ_queue
- Assumptions/Dependencies: Graph-based ANN available; ability to instrument queue/visited sets; small GBDT inference latency; offline training on representative query logs; stable target recall
Sector: Search & E-commerce
- What: Faster product image/text search with attribute filters (brand, price, seller, category)
- Tools/Workflows: E2E as a microservice in the retrieval tier; dynamic per-query budget to cap tail latency; A/B test against static efSearch
- Assumptions/Dependencies: Post-filter traversal; accurate predicate evaluation; filter distributions not drastically drifting
Sector: Recommender Systems (Media/Streaming)
- What: Efficient “similar item” retrieval with tag/genre/region/maturity filters
- Tools/Workflows: Replace static search budgets in candidate generation with E2E; per-profile SLA tiers (kids vs general audience)
- Assumptions/Dependencies: Embedding space exhibits label/range correlations; enough logs to train per-market models
Sector: Enterprise RAG and Search
- What: Low-latency retrieval under access-control, source, date-range, department filters
- Tools/Workflows: Plug-in to vector store in RAG stack (e.g., Milvus/Weaviate/OpenSearch) to predict per-query compute budget; guardrail for P95/P99
- Assumptions/Dependencies: Access-control predicates evaluated during traversal; instrumentation safe with multi-tenant workloads
Sector: Finance
- What: Similar document/event retrieval constrained by time window, instrument sector, risk level
- Tools/Workflows: Use E2E in retrieval microservices that power alerting/due diligence; dynamic budget allocation to respect market open-hour SLAs
- Assumptions/Dependencies: Strict latency budgets; retraining when market regimes shift
Sector: Healthcare/Clinical IR
- What: Query medical notes or imaging embeddings with ICD/department/date filters
- Tools/Workflows: Deploy E2E in on-prem vector search with audit logging; per-query budgets tuned for clinician-facing tools
- Assumptions/Dependencies: PHI-safe logging; privacy review of feature collection; stable embeddings across model updates
Sector: Cybersecurity/Threat Intel
- What: Fast retrieval of similar indicators with label/time/source trust filters
- Tools/Workflows: Correlation-aware budgeter to cap worst-case hunts; SLA-aware routing of “hard” queries to bigger budget pools
- Assumptions/Dependencies: Streaming updates require periodic model refresh; robust fallback for incident spikes
Sector: Developer Productivity (Code Search)
- What: Code snippet search with language/framework/version filters
- Tools/Workflows: Integrate E2E into internal code search services (Sourcegraph-like) to reduce tail latency on large monorepos
- Assumptions/Dependencies: Adequate training coverage across languages; consistent attribute schemas
Sector: Education/EdTech
- What: Content retrieval with grade level, topic, difficulty, date filters
- Tools/Workflows: E2E in curriculum-aligned search; per-school SLA tuning
- Assumptions/Dependencies: Label quality sufficient to learn local correlations
Sector: Operations/SRE
- What: SLA guardrails and autoscaling signals from predicted per-query cost
- Tools/Workflows: Use E2E predictions in queueing/scheduling; preemptively scale when predicted budgets surge
- Assumptions/Dependencies: Telemetry pipeline to export predicted cost; policy to override budget on business-critical queries
Sector: Energy/Green Computing (cross-cutting)
- What: Reduce compute and power for filtered ANN workloads by 2–3× at same recall
- Tools/Workflows: Track NDCs and power per query; report energy savings for sustainability KPIs
- Assumptions/Dependencies: Distance computations dominate energy; minimal estimator overhead

Long-Term Applications

These require further research, system support, or scaling beyond the paper’s scope.

Sector: Database Systems
- What: Cost-based optimizer for hybrid vector + relational queries (joins, group-by, multi-attribute filters) using local correlation
- Tools/Workflows: Extend E2E features into full query plans; integrate with CBOs in Postgres/ClickHouse extensions
- Assumptions/Dependencies: Cardinality estimation across vector proximity; calibrated multi-operator cost models
Sector: Index Design
- What: Correlation-aware index construction (filter-friendly graph rewiring, per-attribute subgraphs)
- Tools/Workflows: Build auxiliary edges for underrepresented filter regions; switch dynamically between POST and PRE in safe regions
- Assumptions/Dependencies: Offline analysis of attribute–vector correlations; safeguards to avoid graph disconnectivity
Sector: Multi-Index/Federated Retrieval
- What: Router that predicts which shard/index/filter combination yields minimal cost for each query
- Tools/Workflows: Shard-level ρ_local predictors; learned federation strategies
- Assumptions/Dependencies: Consistent telemetry across shards; cross-shard drift handling
Sector: Adaptive SLAs and Pricing
- What: Per-query pricing or priority based on predicted cost; fairness across tenants
- Tools/Workflows: Tokens/credits charged per predicted NDC; dynamic queues classed by cost
- Assumptions/Dependencies: Predictive accuracy acceptable for billing; governance around fairness and bias
Sector: Privacy-Preserving Retrieval
- What: Early-probe–based cost estimation under encryption or access-control (e.g., searchable encryption, ABAC)
- Tools/Workflows: Design privacy-safe proxies of ρ_pilot/ρ_queue; secure MPC or TEEs for feature extraction
- Assumptions/Dependencies: Acceptable overhead of crypto/TEE; no leakage via correlation signals
Sector: On-device/Edge Vector Search
- What: Battery-aware early termination for mobile camera search and personal photo retrieval with date/location filters
- Tools/Workflows: TinyGBDT or distilled models; hardware counters for energy per query; dynamic budgets per power state
- Assumptions/Dependencies: Memory-constrained indexes; intermittent connectivity for model refresh
Sector: Robotics/Autonomy
- What: Retrieval of similar scenes/keyframes with task/time filters for navigation and manipulation memory systems
- Tools/Workflows: Integrate E2E into perception-memory modules to bound retrieval latency under safety constraints
- Assumptions/Dependencies: Real-time guarantees; embeddings stable under domain shift
Sector: Continual/Streaming Learning
- What: Drift-aware cost predictor that retrains online as attribute distributions change
- Tools/Workflows: Lightweight online GBDT updates; drift detectors monitoring ρ_pilot residuals
- Assumptions/Dependencies: Safe and explainable hot-reloads; rollback mechanisms
Sector: AutoML for Cost Models
- What: Automated feature selection and model selection for new datasets/filters
- Tools/Workflows: Pipelines that evaluate additional features (queue entropy, gradient of distances) and select best model class (GBDT vs shallow NN)
- Assumptions/Dependencies: Sufficient labeled queries; reproducible training environment
Sector: Benchmarks and Standardization (Academia/Industry)
- What: Standard tasks and datasets that stress local correlation vs global selectivity for filtered ANN
- Tools/Workflows: Extend ANN-Benchmarks with filtered tracks; publish feature–cost ground truth; leaderboard for early termination
- Assumptions/Dependencies: Community consensus on metrics (NDCs, P95 latency at fixed recall)
Sector: Responsible/Policy
- What: Governance around attribute–embedding correlations (fairness, leakage) in retrieval systems
- Tools/Workflows: Policy requiring tests for sensitive correlations; disclosure of estimator behavior; energy-efficiency reporting
- Assumptions/Dependencies: Organizational maturity for ML governance; access to audit datasets

Cross-cutting assumptions and dependencies to consider

Training data: Needs representative query–filter distributions; periodic refresh for drift.
Index type: Benefits are clearest for graph-based ANN with POST filtering; other indexes (IVF, PQ) require adaptation of runtime features.
Target recall and SLAs: Predictions must be calibrated to the recall target; provide safe fallback (e.g., cap minimum budget).
Telemetry and overhead: Must collect queue/visited statistics with negligible overhead; preserve privacy in logs.
Portability: Models are dataset/index-specific; expect per-corpus calibration and A/B validation.
Robustness: Provide guardrails for worst-case anti-correlated filters; enable overrides for mission-critical queries.
Compliance: Ensure access-control predicates are enforced before adding to result sets; audit feature usage for privacy.

View Paper Prompt View All Prompts

Glossary

Adaptive Early Termination: A technique that ends the search early based on predicted convergence to reduce computation and latency. Example: "To address this, prior work has proposed adaptive early termination."
AKNN (Approximate k Nearest Neighbor): A class of algorithms that retrieve near neighbors efficiently by trading exactness for speed. Example: "A critical instance of this paradigm is filtered Approximate $k$ Nearest Neighbor (AKNN) search"
Attributed Vector Dataset: A dataset where each vector is paired with attribute(s) used for filtering during search. Example: "A data item in an attributed vector dataset $S$ is defined as $o_i=(x_i, A_i)$ "
Beam Search: A search strategy that maintains a fixed-size set of the best candidates while exploring neighbors. Example: "For AKNN search, beam search is used to acquire more than one result."
Beam Width: The fixed size of the candidate set maintained during beam search. Example: "It maintains a queue containing the top- $m$ closest candidates during traversal, where $m$ is the beam width."
Coefficient of Determination (R²⁾: A regression metric measuring the proportion of variance explained by a model. Example: "we use the Coefficient of Determination ( $R^2$ ) to measure regression quality"
Curse of Dimensionality: The phenomenon where high-dimensional spaces make exact search methods computationally infeasible. Example: "Exact $k$ Nearest Neighbor search often suffers from the curse of dimensionality"
DARTH: A learned, runtime-signal-based method for adaptive early termination in ANN graph search. Example: "State-of-the-art techniques such as LAET~\cite{LAET-SIGMOD-2020} and DARTH~\cite{DARTH-SIGMOD-2026} employ lightweight machine learning models"
Early Probe: An initial, short traversal designed to collect local, filter-aware features with negligible overhead. Example: "Our early probe is simply the first $f$ steps of the actual query execution."
efsearch: A HNSW parameter controlling the search queue size (effort) during traversal. Example: "setting a large global search queue size parameter such as efsearch~\cite{HNSW-PAMI-2020}"
Feature-Filter Misalignment: A condition where vector proximity is poorly aligned with filter validity, misleading convergence signals and increasing cost. Example: "we formally define the Feature-Filter Misalignment problem"
Filter Predicate: A boolean function that determines whether a data item satisfies the query’s filter constraint. Example: "To determine whether a data item satisfies a query, we formally define the concept of a Filter Predicate."
GBDT (Gradient Boosting Decision Tree): An ensemble learning method that builds additive decision trees to model complex relationships. Example: "For our cost estimator model, we choose a Gradient Boosting Decision Tree (GBDT)"
Global Selectivity: The probability that a random item in the entire dataset satisfies the filter condition. Example: "Global Selectivity ( $\sigma_{global}$ ): When $S' = S$ (the entire dataset), $\sigma(S, f_q)$ represents the overall probability that a random object satisfies the filter."
Greedy Routing: The initial ANN graph-search phase that iteratively moves to closer neighbors to find an approximate top-1. Example: "The complexity of this greedy routing phase is generally logarithmic with respect to the dataset size"
HNSW (Hierarchical Navigable Small World): A high-performance graph-based index for ANN with hierarchical layers. Example: "Advanced structures like HNSW~\cite{HNSW-PAMI-2020} and HVS~\cite{HVS-VLDB-2021-kejing-lu} further optimize this phase"
HVS: A sparse navigational graph structure that optimizes greedy descent in ANN search. Example: "Advanced structures like HNSW~\cite{HNSW-PAMI-2020} and HVS~\cite{HVS-VLDB-2021-kejing-lu} further optimize this phase"
IVF (Inverted File Index): A partitioned vector index where the nprobes parameter controls the number of partitions scanned. Example: "efsearch of HNSW, nprobes of IVF"
LAET: A lightweight learning-based estimator that predicts convergence for early termination using runtime features. Example: "State-of-the-art techniques such as LAET~\cite{LAET-SIGMOD-2020} and DARTH~\cite{DARTH-SIGMOD-2026} employ lightweight machine learning models"
LightGBM: A fast, optimized GBDT framework used for low-latency cost estimation. Example: "We specifically implement our estimator using LightGBM"
Local Correlation: The selectivity of valid items within the query’s local neighborhood, capturing filter-vector dependence. Example: "Local Correlation ( $\rho_{local}$ ): When $S'$ corresponds to a proximity neighborhood of $x_q$ (e.g., the top- $m$ nearest neighbors based on vector distance), $\rho_{local} =\sigma(S', f_q)$ represents the local query-filter correlation."
Mean Squared Error (MSE): A loss function minimized during regression to fit predictions to targets (here, in log-space). Example: "The model $\mathcal{M}$ is trained to predict $\log(W_q)$ by minimizing the Mean Squared Error (MSE) in the log-space"
Mean Squared Logarithmic Error (MSLE): A loss emphasizing relative error, equivalent to MSE in log-space. Example: "minimizing MSE in the log-space is mathematically equivalent to minimizing the Mean Squared Logarithmic Error (MSLE) in the original space."
Number of Distance Computations (NDC): A hardware-agnostic metric counting distance evaluations performed during search. Example: "We quantify the total computational cost, denoted as $W_q$ , by the Number of Distance Computation (NDC) performed during the filtered search for $q$ ."
POST Filtering Strategy: A filtered search approach that traverses all candidates but only keeps those matching the filter. Example: "The $\POST$ strategy, widely adopted in industry~\cite{hnswlib,Diskann-NIPS-2019,HNSW-PAMI-2020}, traverses the full graph but only adds candidates satisfying the filter to the result queue."
PRE Filtering Strategy: A filtered search approach that prunes traversal by skipping nodes failing the filter constraint. Example: "Conversely, $\PRE$ strategies restrict the search space by logically removing filtered nodes or edges during traversal."
RAG (Retrieval-Augmented Generation): Systems that augment LLM generation with retrieved nearest neighbors or documents. Example: "retrieval-augmented generation (RAG) systems for LLMs"
Recall@ $k$ : A quality metric measuring the fraction of ground-truth top-k neighbors retrieved. Example: "Quality is measured by Recall@ $k$ , defined as the proportion of the true top- $k$ ground-truth neighbors retrieved."
Search Budget: The configured or predicted amount of computation allocated to a query’s search. Example: "The naive method sets a relatively large search budget for the filtered AKNN query to guarantee both accuracy and sufficient results."
Selectivity: The proportion of items satisfying a filter within a given data scope. Example: "the selectivity $\sigma(S', f_q)$ measures the proportion of items in $S'$ that satisfy the filter condition"
Selectivity Estimation: Techniques for predicting filter selectivity to guide query optimization. Example: "While selectivity estimation~\cite{SE-probalistic-SIGMOD-2001,SE-range-predicae-VLDB-2019} has been studied for decades, directly adapting traditional techniques to filtered AKNN search presents fundamental challenges:"

View Paper Prompt View All Prompts

Open Problems

We found no open problems mentioned in this paper.

Continue Learning

Collections

YouTube

Show All Videos

Filtered Approximate Nearest Neighbor Search Cost Estimation

Summary

Filtered Approximate Nearest Neighbor Search Cost Estimation: An Expert Review

Problem Statement and Motivation

The $\EE$ Framework and Core Innovations

Technical Contributions and Implementation Details

Feature Engineering and Model Training

Adaptive Termination Algorithm

Empirical Evaluation

Feature Importance and Generalization

Practical and Theoretical Implications

Conclusion

Paper to Video (Beta)

Whiteboard

Paper Prompts

Top Community Prompts

Explain it Like I'm 14

What is this paper about?

What questions does it try to answer?

How did the researchers approach the problem?

Key idea: Early probing

What features do they look at?

The prediction model

Using the prediction to stop early

What did they find?

Why does it matter?

Knowledge Gaps

Knowledge gaps, limitations, and open questions

Practical Applications

Immediate Applications

Long-Term Applications

Cross-cutting assumptions and dependencies to consider

Glossary

Open Problems

Continue Learning

Collections

YouTube

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research