Papers
Topics
Authors
Recent
Search
2000 character limit reached

Mycelium-Index: A Streaming Approximate Nearest Neighbor Index with Myelial Edge Decay, Traffic-Driven Reinforcement, and Adaptive Living Hierarchy

Published 13 Apr 2026 in cs.LG and cs.IR | (2604.11274v1)

Abstract: We present mycelium-index, a streaming approximate nearest neighbor (ANN) index for high-dimensional vector spaces, inspired by the adaptive growth patterns of biological mycelium. The system continuously adapts its topology through myelial edge decay and reinforcement, a traffic-driven living hierarchy, and hybrid deletion combining O(1) bypass for cold nodes with O(k) beam-search repair for hub nodes. Experimental evaluation on SIFT-1M demonstrates that mycelium achieves 0.927 +/- 0.028 recall@5 under FreshDiskANN's 100%-turnover benchmark protocol -- within the measurement confidence interval of FreshDiskANN's ~0.95 -- while using 5.7x less RAM (88 MB vs. >500 MB) and achieving 4.7x higher QPS (2,795 vs. ~600). On the static index, at ef=192, mycelium matches HNSW M=16 recall (0.962 vs. 0.965) at 5.2x less RAM (163 MB vs. 854 MB). Performance optimizations including NEON SIMD distance computation, Vec-backed node storage, and bitset visited tracking yield a cumulative 2.7x QPS improvement. A systematic study of ten streaming repair mechanisms finds that geometric heuristics universally fail in high dimensions, while topological mechanisms succeed -- a principle we term the topological repair invariance of high-dimensional ANN graphs.

Authors (1)

Summary

  • The paper introduces a novel streaming ANN index that leverages myelial edge decay, traffic-driven reinforcement, and an adaptive living hierarchy to maintain robust connectivity under data churn.
  • It employs a hybrid deletion and repair scheme that enhances query throughput by 4.7× while reducing memory usage by over 5× compared to traditional methods.
  • Scalar quantization and tiered storage enable efficient handling of high-dimensional data, validated through competitive recall and sublinear latency scaling in the SIFT-1M benchmark.

Mycelium-Index: A Topologically-Adaptive Streaming ANN Graph

Overview

The "Mycelium-Index" (2604.11274) introduces a streaming approximate nearest neighbor (ANN) index for high-dimensional vector spaces that departs from conventional, geometry-centered maintenance, instead using traffic-driven topological adaptation inspired by the dynamics of biological mycelium networks. This novel index combines three mechanisms—myelial edge decay and reinforcement, a living query-driven hierarchy, and a hybrid repair scheme—enabling high efficiency, low memory usage, and robust adaptability under heavy data churn, while maintaining competitive recall relative to state-of-the-art systems.

Methodology

Myelial Edge Decay and Topological Reinforcement

Each edge in the graph carries an importance score (use_count), timestamp, and is decayed over time via an exponential decay parameter λ\lambda (with the effective importance given by use_count×λt−tlastuse\_count \times \lambda^{t - t_{last}}). Edge reinforcement occurs dynamically—query traversals increment the use_count, stabilizing highly-trafficked edges and removing rarely used connections below a threshold. This mechanism removes the necessity for explicit rebalancing passes or global reconstructions.

Living Hierarchy

Instead of the static hierarchy found in HNSW, Mycelium employs a traffic-driven living hierarchy: nodes accumulate a query_use_count, and the top percentiles are periodically promoted to higher levels. Level assignment is refreshed based on streaming insertions, aligning the navigational shortcut structure closely to current query patterns. This adaptive strategy addresses hierarchy staleness, a primary degradation vector in streaming scenarios, especially when node deletions are frequent.

Hybrid Deletion and Repair

Mycelium employs a bimodal deletion scheme. "Cold" nodes are soft-deleted via O(1)O(1) bypass, while the top-traffic "hub" nodes (approximately the top 10%) trigger a O(k)O(k) beam-search-based repair (effort parameter ef=64). This hybridization provides recall efficacy equivalent to fully repairing all deletions but with a 46% higher query-per-second (QPS) rate. This approach is validated by a large recall gap (from −0.21-0.21 for pure bypass to −0.023-0.023 for hybrid) closed as soon as targeted hub repair is introduced.

Scalar Quantization and Tiered Storage

To address the prohibitive RAM usage of full-precision storage, Mycelium retains "hot" vectors in RAM (HashMap), while "cold," less-accessed vectors reside in memory-mapped, quantized files using per-dimension linear scaling to uint8. This achieves a 4×\times compression ratio over float32, allowing operation in regimes (88–163 MB RAM for SIFT-1M) inaccessible to engines that do not compress base vectors.

Experimental Results

Recall, Memory, and Throughput

  • Under the SIFT-1M streaming benchmark (FreshDiskANN protocol, 100% data turnover, ef=128):
    • Mycelium achieves 0.927±0.0280.927 \pm 0.028 recall@5.
    • This is within the measurement confidence interval of FreshDiskANN (∼\sim0.95), but with 5.7×5.7\times less memory (88 MB vs. use_count×λt−tlastuse\_count \times \lambda^{t - t_{last}}0500 MB) and use_count×λt−tlastuse\_count \times \lambda^{t - t_{last}}1 higher QPS (2,795 vs. use_count×λt−tlastuse\_count \times \lambda^{t - t_{last}}2600).
  • On the static index (ef=192):
    • Mycelium reaches 0.962 recall@10, matching HNSW (M=16, ef=128, 0.965 recall) with use_count×λt−tlastuse\_count \times \lambda^{t - t_{last}}3 less RAM (163 MB vs. 854 MB).
  • Performance optimization via NEON SIMD, compact node storage (Vec-backed), and a bitset for visited tracking led to a use_count×λt−tlastuse\_count \times \lambda^{t - t_{last}}4 QPS improvement and sublinear use_count×λt−tlastuse\_count \times \lambda^{t - t_{last}}5 mean latency scaling.

Ablation and Failure of Geometric Repair

Systematic ablation (10 mechanisms tested) demonstrated that all four geometric repair approaches (e.g., Physarum flow, geometric anastomosis, angular gap filling) failed in 128-dimensional space, with recall@5 as low as 0.02–0.49 after high turnover. In contrast, all topological, traffic-driven mechanisms succeeded, yielding stable recall and graph connectivity. Increasing the graph degree by geometric heuristics consistently degraded recall, confirming the "topological repair invariance": in high-dimensions, entry-point staleness, not geometric graph structure, is the primary recall bottleneck.

Ensemble and Recall/Loss Distribution

Constructing an ensemble of two independently-grown mycelium graphs provided a recall boost (from 0.867 to 0.935 at ef=64), but a single graph with higher ef (ef=128) could match or exceed ensemble recall without added memory cost. Analysis of missed neighbors showed that 78% were caused by beam-search cutoffs, not unreachable graph components; entry points remained near-optimal.

Implications and Theoretical Insights

Topological Principle for Streaming ANN Maintenance

The study establishes a strong negative result: geometric intuition, which underpins many low-dimensional graph repair strategies (midpoint search, angular gaps), is not predictive in high dimensions. The effective paradigm for ANN maintenance under churn is topological, centered on query traffic statistics and adaptive, traffic-driven connectivity—this supersedes both classic geometric and flow-inspired mechanisms.

Practical Impact

The hybrid deletion, living hierarchy, and quantized storage design enable Mycelium to operate in memory regimes (88–163 MB for SIFT-1M) where batch-oriented solutions (HNSW, DiskANN) and streaming solutions lacking quantization (FreshDiskANN, EnhanceGraph) are not feasible, particularly with 100% data turnover. This makes it suitable for high-throughput, memory-constrained, real-time applications such as semantic search and RAG.

Limitations and Future Work

  • The periodic use_count×λt−tlastuse\_count \times \lambda^{t - t_{last}}6 living hierarchy refresh could be replaced with an incremental, event-driven update for further efficiency.
  • The static threshold for triggering hybrid repair could be dynamically tuned based on real-time recall diagnostics.
  • Validation is currently limited to SIFT-1M (128D, Euclidean); testing on higher dimensions (e.g., 768D text embeddings), cosine similarity, and more diverse datasets is necessary for generality claims.

Conclusion

Mycelium-Index establishes a new regime for streaming ANN graph indices, demonstrating that dynamic topological adaptation—driven by query traffic, myelial decay, and selective repair—is sufficient to match recall and surpass throughput and memory usage benchmarks set by prior art, even under high data churn. The systematic failure of geometric repair approaches in high dimensions and the efficacy of purely topological mechanisms represent a significant conceptual shift for ANN maintenance. Future work will focus on tuning adaptivity, extending evaluation to diverse embedding domains, and investigating incremental hierarchical updates.

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Collections

Sign up for free to add this paper to one or more collections.