SCM: Sleep-Consolidated Memory with Algorithmic Forgetting for Large Language Models

Published 22 Apr 2026 in cs.LG | (2604.20943v1)

Abstract: We present SCM (Sleep-Consolidated Memory), a research preview of a memory architecture for LLMs that draws on neuroscientific principles to address a fundamental limitation in current systems: the absence of persistent, structured, and biologically plausible memory. Existing approaches rely on truncating context windows, growing vector databases without bound, or tiered storage systems that lack consolidation and forgetting mechanisms. SCM implements five core components inspired by human memory: a limited-capacity working memory, multi-dimensional importance tagging, offline sleep-stage consolidation with distinct NREM and REM phases, intentional value-based forgetting, and a computational self-model enabling introspection. Across a standardized benchmark suite of eight tests, the prototype achieves perfect recall accuracy over ten-turn conversations while reducing memory noise by 90.9% through adaptive forgetting. Memory search latency remains below one millisecond even with hundreds of stored concepts. This work establishes the architectural foundations for memory systems that consolidate, prioritize, and forget, offering a testable platform for advancing LLM memory research.

Abstract PDF Upgrade to Chat

Authors (1)

Saish Sachin Shinde

Summary

The paper introduces SCM, a novel architecture that employs sleep-inspired consolidation and algorithmic forgetting to optimize LLM memory systems.
It details a multi-stage sleep cycle for consolidation, dreaming, and intentional forgetting, achieving perfect recall and robust noise pruning.
Empirical results and ablation studies demonstrate sub-millisecond retrieval and stable memory growth through adaptive, importance-based pruning.

SCM: Sleep-Consolidated Memory with Algorithmic Forgetting for LLMs

Introduction and Motivation

Sleep-Consolidated Memory (SCM) addresses foundational limitations in persistent memory systems for LLM-based agents by integrating biologically inspired mechanisms such as bounded working memory, multidimensional importance tagging, offline sleep-driven consolidation, algorithmic forgetting, and a computational self-model. Existing LLM memory architectures—namely context expansion, vector database retrieval, and tiered storage—fail to combine the selective encoding, prioritization, consolidation, and forgetting characteristic of human episodic and semantic memory systems. SCM departs from purely append-only and cache-based paradigms by encoding structured concepts, tagging them with importance across semantic and affective axes, and managing long-term memory as an adaptive, prunable graph, consistently referencing neurocognitive frameworks.

The architectural abstraction is defined as an input pipeline (MeaningEncoder and ValueTagger), a bounded WorkingMemory buffer for incoming episodes, a persistent LongTermMemory associative graph, and a SleepCycle controller implementing NREM consolidation, REM-driven creative association, and adaptive, value-based forgetting (Figure 1).

Figure 1: SCM system architecture with distinct wake and sleep processing modules and explicit memory consolidation, dreaming, and forgetting routines.

Core Methodologies

Semantic Encoding, Value Tagging, and Working Memory

SCM employs a semantic parser leveraging local LLM inference (Llama-3.2 Q4_K_M) to extract concept nodes of types drawn from a controlled taxonomy (e.g., person, preference, fact, event), assigning a 384D embedding and structured relation edges. The ValueTagger computes an importance vector per concept along four axes: novelty (embedding-based uniqueness), affective valence (LLM-driven sentiment), task relevance (goal-aware embedding cosine similarity), and normalized repetition count, which are linearly combined with ablation-optimized weights.

WorkingMemory is capped at seven episodes, with prioritization and recency effects controlled by access boosting, in line with human cognitive limits. Concepts transition from WM to long-term graph storage through SleepCycle triggers governed by entropy and conflict or time-based thresholds.

SleepCycle: Consolidation, Dreaming, and Forgetting

The SleepCycle is formally realized as a multi-stage finite state machine (Figure 2):

Figure 2: SleepCycle state machine with transitions through WAKE, NREM, REM, and controlled forgetting states.

NREM Consolidation: Replay of WM episodes strengthens concept co-activation edge weights ( $\Delta s_{ij} = \eta I(c_i)I(c_j)$ ) followed by global proportional synaptic downscaling ( $s_{ij}\leftarrow\alpha s_{ij}$ ).
REM Dreaming: High-importance nodes initiate stochastic random walks, generating novel associations subject to contradiction constraints and semantic validation.
Intentional Forgetting: Each concept's retention blends importance and age-weighted recency, with adaptive thresholding ensuring the long-term memory graph stabilizes to a target size by pruning low-value, infrequent, or outdated concepts.

Through careful control of threshold and weighting parameters (notably $\beta_2$ in the decay formula), the system achieves both stability and high information retention.

Self-Model

A self-referential subsystem maintains explicit memory of system identity, capability enumeration, meta-counters, and episodic recordings of sleep cycles. This supports introspective querying and facilitates reporting of internal state for agent transparency.

Experimental Findings

Benchmarking and Component Ablation

SCM's capabilities were benchmarked on eight axes, encompassing recall capacity, latency, consolidation gain, forgetting effectiveness, graph traversal, and cross-session persistence. Perfect recall (1.00) was achieved across all tests, with enforced WM capacity, rapid retrieval ( $<$ 1 ms at $>$ 300 concepts), 90.9% noise reduction through forgetting, and deterministic multi-session recovery.

Figure 3: Memory size over sleep cycles—without forgetting, memory grows linearly; with adaptive forgetting, size plateaus as noise is pruned.

A rigorous ablation study was conducted to measure the effect of disabling major modules. Absence of ValueTagger (uniform importance) reduced recall to 81.8% and increased retained noise (Table: ablation results, not shown). Disabling NREM decreased recall to 90.9%; the absence of forgetting led to unbounded memory growth, matching or exceeding practical drawbacks of vector DB baselines. Notably, SleepCycle parameter tuning is critical: buggy importance-decay weighting ( $\beta_2=0.4$ ) previously precluded forgetting, whereas the corrected setting ( $\beta_2=0.2$ ) achieved the intended pruning behavior (Figure 4).

Figure 4: Effect of forgetting formula correction—appropriate decay weighting enables robust noise pruning without loss of high-value concepts.

Latency and Practical Performance

Empirical results demonstrate that the combination of importance-based retrieval, graph traversal, and forgetting does not introduce operational overhead: sub-millisecond inference was maintained across all tested scales (up to 360 concepts on an M1 MacBook Air, Figure 5).

Figure 5: Benchmark test scores—SCM achieves perfect recall, noise reduction, low latency, and robust persistence across all evaluated tasks.

Implications

Theoretical and Practical Impact

SCM bridges a critical gap between neuro-inspired memory architectures and production-ready LLM memory augmentation. By operationalizing selective retention, consolidation, and intentional forgetting, SCM avoids the cardinal failure modes of unbounded context windows and indefinite vector DB growth. Importance tagging across multiple axes enables prioritization far beyond recency or frequency, supporting the efficient removal of noise and irrelevancies while maintaining high recall rates for deliberatively structured concepts.

The sleep-driven offline phase further enables associative reasoning via REM-based synthetic linkage creation. The explicit self-model, while not a claim toward phenomenological selfhood, provides a platform for introspective, capability-aware agent reporting and diagnosis. Algorithmic forgetting introduces privacy and compliance primitives—enabling right-to-be-forgotten policies—that vector retrieval or archive-based systems cannot deliver.

Limitations and Future Work

Current bottlenecks include the scalability limits of NetworkX for graphs $>$ 10K nodes and dependence on extraction fidelity of the local LLM for semantic parsing. Multi-modal extension and autonomous background operation (continuous existence) are enumerated as future research priorities, as is embodied memory via the inclusion of non-linguistic sensory streams and predictive/future-relevant memory prefetching.

Conclusion

SCM marks a transition from LLM memory expansion via ever-larger vector stores and recency buffers to structured, self-limiting, and prioritized semantic memory suitable for autonomous agents. Strong empirical results affirm the efficacy of integrating humanlike sleep consolidation, multidimensional value assignment, and algorithmic forgetting in LLM memory augmentation, achieving both high recall and memory efficiency.

SCM's architecture supports new directions in interpretable and auditable AI, offering mechanisms for intentional, transparent memory curation. It establishes a robust experimental substrate for LLM memory research, providing extensible tools to probe the relationship between structured memory, agentic continuity, and cognitive plausibility.

Markdown Report Issue