- The paper introduces a novel streaming ANN index that leverages myelial edge decay, traffic-driven reinforcement, and an adaptive living hierarchy to maintain robust connectivity under data churn.
- It employs a hybrid deletion and repair scheme that enhances query throughput by 4.7× while reducing memory usage by over 5× compared to traditional methods.
- Scalar quantization and tiered storage enable efficient handling of high-dimensional data, validated through competitive recall and sublinear latency scaling in the SIFT-1M benchmark.
Mycelium-Index: A Topologically-Adaptive Streaming ANN Graph
Overview
The "Mycelium-Index" (2604.11274) introduces a streaming approximate nearest neighbor (ANN) index for high-dimensional vector spaces that departs from conventional, geometry-centered maintenance, instead using traffic-driven topological adaptation inspired by the dynamics of biological mycelium networks. This novel index combines three mechanisms—myelial edge decay and reinforcement, a living query-driven hierarchy, and a hybrid repair scheme—enabling high efficiency, low memory usage, and robust adaptability under heavy data churn, while maintaining competitive recall relative to state-of-the-art systems.
Methodology
Myelial Edge Decay and Topological Reinforcement
Each edge in the graph carries an importance score (use_count), timestamp, and is decayed over time via an exponential decay parameter λ (with the effective importance given by use_count×λt−tlast​). Edge reinforcement occurs dynamically—query traversals increment the use_count, stabilizing highly-trafficked edges and removing rarely used connections below a threshold. This mechanism removes the necessity for explicit rebalancing passes or global reconstructions.
Living Hierarchy
Instead of the static hierarchy found in HNSW, Mycelium employs a traffic-driven living hierarchy: nodes accumulate a query_use_count, and the top percentiles are periodically promoted to higher levels. Level assignment is refreshed based on streaming insertions, aligning the navigational shortcut structure closely to current query patterns. This adaptive strategy addresses hierarchy staleness, a primary degradation vector in streaming scenarios, especially when node deletions are frequent.
Hybrid Deletion and Repair
Mycelium employs a bimodal deletion scheme. "Cold" nodes are soft-deleted via O(1) bypass, while the top-traffic "hub" nodes (approximately the top 10%) trigger a O(k) beam-search-based repair (effort parameter ef=64). This hybridization provides recall efficacy equivalent to fully repairing all deletions but with a 46% higher query-per-second (QPS) rate. This approach is validated by a large recall gap (from −0.21 for pure bypass to −0.023 for hybrid) closed as soon as targeted hub repair is introduced.
Scalar Quantization and Tiered Storage
To address the prohibitive RAM usage of full-precision storage, Mycelium retains "hot" vectors in RAM (HashMap), while "cold," less-accessed vectors reside in memory-mapped, quantized files using per-dimension linear scaling to uint8. This achieves a 4× compression ratio over float32, allowing operation in regimes (88–163 MB RAM for SIFT-1M) inaccessible to engines that do not compress base vectors.
Experimental Results
Recall, Memory, and Throughput
- Under the SIFT-1M streaming benchmark (FreshDiskANN protocol, 100% data turnover, ef=128):
- Mycelium achieves 0.927±0.028 recall@5.
- This is within the measurement confidence interval of FreshDiskANN (∼0.95), but with 5.7× less memory (88 MB vs. use_count×λt−tlast​0500 MB) and use_count×λt−tlast​1 higher QPS (2,795 vs. use_count×λt−tlast​2600).
- On the static index (ef=192):
- Mycelium reaches 0.962 recall@10, matching HNSW (M=16, ef=128, 0.965 recall) with use_count×λt−tlast​3 less RAM (163 MB vs. 854 MB).
- Performance optimization via NEON SIMD, compact node storage (Vec-backed), and a bitset for visited tracking led to a use_count×λt−tlast​4 QPS improvement and sublinear use_count×λt−tlast​5 mean latency scaling.
Ablation and Failure of Geometric Repair
Systematic ablation (10 mechanisms tested) demonstrated that all four geometric repair approaches (e.g., Physarum flow, geometric anastomosis, angular gap filling) failed in 128-dimensional space, with recall@5 as low as 0.02–0.49 after high turnover. In contrast, all topological, traffic-driven mechanisms succeeded, yielding stable recall and graph connectivity. Increasing the graph degree by geometric heuristics consistently degraded recall, confirming the "topological repair invariance": in high-dimensions, entry-point staleness, not geometric graph structure, is the primary recall bottleneck.
Ensemble and Recall/Loss Distribution
Constructing an ensemble of two independently-grown mycelium graphs provided a recall boost (from 0.867 to 0.935 at ef=64), but a single graph with higher ef (ef=128) could match or exceed ensemble recall without added memory cost. Analysis of missed neighbors showed that 78% were caused by beam-search cutoffs, not unreachable graph components; entry points remained near-optimal.
Implications and Theoretical Insights
Topological Principle for Streaming ANN Maintenance
The study establishes a strong negative result: geometric intuition, which underpins many low-dimensional graph repair strategies (midpoint search, angular gaps), is not predictive in high dimensions. The effective paradigm for ANN maintenance under churn is topological, centered on query traffic statistics and adaptive, traffic-driven connectivity—this supersedes both classic geometric and flow-inspired mechanisms.
Practical Impact
The hybrid deletion, living hierarchy, and quantized storage design enable Mycelium to operate in memory regimes (88–163 MB for SIFT-1M) where batch-oriented solutions (HNSW, DiskANN) and streaming solutions lacking quantization (FreshDiskANN, EnhanceGraph) are not feasible, particularly with 100% data turnover. This makes it suitable for high-throughput, memory-constrained, real-time applications such as semantic search and RAG.
Limitations and Future Work
- The periodic use_count×λt−tlast​6 living hierarchy refresh could be replaced with an incremental, event-driven update for further efficiency.
- The static threshold for triggering hybrid repair could be dynamically tuned based on real-time recall diagnostics.
- Validation is currently limited to SIFT-1M (128D, Euclidean); testing on higher dimensions (e.g., 768D text embeddings), cosine similarity, and more diverse datasets is necessary for generality claims.
Conclusion
Mycelium-Index establishes a new regime for streaming ANN graph indices, demonstrating that dynamic topological adaptation—driven by query traffic, myelial decay, and selective repair—is sufficient to match recall and surpass throughput and memory usage benchmarks set by prior art, even under high data churn. The systematic failure of geometric repair approaches in high dimensions and the efficacy of purely topological mechanisms represent a significant conceptual shift for ANN maintenance. Future work will focus on tuning adaptivity, extending evaluation to diverse embedding domains, and investigating incremental hierarchical updates.