Mind Modeling: A ToM-Based Framework for Personalization

Published 11 May 2026 in cs.HC | (2605.10306v1)

Abstract: User modeling has traditionally relied on inferring preferences, traits, or intents from observable behaviour. While effective in many adaptive systems, this paradigm treats behaviour as the primary object of modeling and leaves mental-state attribution implicit. This assumption becomes limiting in socially situated and longitudinal interaction, where behaviour must be interpreted in context and over time. We introduce mind modeling, a perspective in which user modeling is grounded in the explicit and revisable attribution of mental states, including beliefs, intentions, emotions, and knowledge. Drawing on Theory of Mind (ToM), this approach treats behaviour as evidence for hypotheses about internal states, supporting personalization that is more interpretable and coherent across interaction episodes. We present M3, a conceptual framework that integrates perception, mentalisation, and action within a unified structure, enabling the continuous update of mental-state hypotheses in embodied interaction. We further illustrate this perspective through an embodied interaction trace, providing an initial operationalization of mind modeling in practice.

Abstract PDF Upgrade to Chat

Authors (1)

Cristina Gena

Summary

The paper introduces the M3 framework that applies Theory-of-Mind for dynamic, explainable user modeling in interactive systems.
It integrates multimodal perception with dual cognitive architectures to continuously update mental-state hypotheses in real time.
The approach enhances personalization transparency and adaptability, though empirical validation and scalability remain open challenges.

Mind Modeling: A Theory-of-Mind-Based Paradigm for Adaptive Personalization

Reframing User Modeling with Theory of Mind

The paper "Mind Modeling: A ToM-Based Framework for Personalization" (2605.10306) systematically critiques the prevailing paradigm of user modeling in adaptive systems. Traditional methods predominantly interpret observable behaviors to infer user preferences, traits, or goals, leveraging techniques ranging from rule-based approaches to deep learning-driven inference. However, this behaviorist paradigm downplays the latent cognitive and affective states that drive behavior, an omission that becomes salient in embodied, longitudinal, and socially situated interactions, such as those with conversational agents or social robots.

The paper identifies the ambiguity and context sensitivity of user behavior in such interactive settings, arguing that mere behavioral regularities are insufficient for robust personalization, especially where interpretability, social appropriateness, and longitudinal coherence are essential. Instead, the authors propose grounding user modeling in explicit Theory of Mind (ToM)—facilitating the attribution, maintenance, and continuous update of user mental states (beliefs, intentions, emotions, knowledge). This perspective not only augments behavioral profiling but also centers explainability and temporal coherence as structural properties of personalized systems.

The M3 Framework: Integrating Perception, Mentalisation, and Action

The core contribution is the M3 framework, which operationalizes ToM principles for user modeling in adaptive systems. M3 structures the modeling process as a continuous, perception-action loop tightly coupling three elements:

Mental State Space: The user model explicitly represents beliefs, intentions, goals, emotions, and knowledge, with continuous estimates of uncertainty.
Multimodal Perception: Raw user observations—language, facial expression, prosody, behavioral cues—are mapped to evidence for mental-state inference.
Dynamic Update and Actions: Mental-state hypotheses are continuously revised using a mentalisation operator, and system actions are generated as context-sensitive functions of these hypotheses in conjunction with internal system state and contextual constraints.

This framework embodies a metacognitive approach: the system maintains parallel internal and user-directed state spaces, supporting reasoning not only about the user's observed behavior but also about the underlying mental attributions. The dynamic update loop enables both reactive and anticipatory personalization, grounding every system action in explainable, traceable hypotheses with explicit uncertainty estimates.

A key distinction emphasized in the paper is that, within the M3 framework, behavioral regularities are treated as corroborating or contesting evidence for mental-state hypotheses, not as primary model variables. Thus, the system can maintain and revise alternative mental-state explanations for the same behavioral data, reflecting ambiguity and context sensitivity across interactions.

Implementation Directions: Hybrid Cognitive Architecture

The operationalization of M3 is instantiated via a hybrid cognitive architecture inspired by the Common Model of Cognition and SOAR. Noteworthy architectural features include:

Layered Representation: Cognitive layers segregate symbolic ToM reasoning, subsymbolic (e.g., deep learning) perceptual processing, and embodied simulation (synchrony, affective resonance).
Dual Knowledge Sources: LLMs with retrieval-augmented generation (RAG) enable flexible context-sensitive dialogue management, while declarative ontologies provide explicit, traceable representational structures.
Memory Subsystems: Working memory maintains live ToM predicates (e.g., 'user-believes'), procedural memory encodes adaptive strategies, semantic memory establishes explicit ontologies of mental states, and Bayesian episodic memory supports belief revision.
User Modeling Reasoner: This central module maintains and revises probabilistic hypotheses over mental-state variables, integrating multimodal sensory cues with interaction history. Inspired by the AMS framework, it represents epistemic, emotive, intentional, imaginative, and perceptual user states as latent variables with uncertainty scores.

This architecture enables both short-term adaptation (e.g., immediate adjustment to inferred user uncertainty) and longitudinal personalization (e.g., evolving user goals and affect tracked across sessions). Behavioral policies are selected by referencing the current state of the user mind model, facilitating both adaptive actions and on-demand explanations.

Theoretical and Practical Implications

Mind modeling, as articulated in this framework, reconceptualizes personalization as a problem of inverse planning under cognitive and affective uncertainty, rather than straightforward behavioral prediction. By treating beliefs, goals, and emotions as explicit ontological types in the user model, the approach enhances the transparency, interpretability, and social credibility of adaptive agents.

Practical implications include:

Explainability: System decisions are structurally grounded in the inferred mental-state hypotheses, supporting not only local action explanations but also global, longitudinal justification of adaptation paths.
Evaluation Criteria: Beyond predictive accuracy, mind modeling introduces new dimensions for system evaluation—coherence and plausibility of mental-state revision, agreement with human assessments, and quality of explanation.
Ethical Challenges: The explicit modeling of user minds entails new risks—potential over-interpretation, privacy concerns, and impacts due to system actions based on uncertain or incorrect attributions. The framework addresses these challenges through explicit uncertainty representation and a call for mechanisms to allow user contestation and transparency.

Limitations and Prospects for Future Research

While the M3 framework is comprehensive and theoretically well-motivated, the paper explicitly notes several open issues:

The operationalization of ToM constructs and design of mentalisation algorithms are domain-specific and methodologically non-unique.
Empirical validation, especially regarding the improvement in user trust, adaptation coherence, and explainability, remains pending and is identified as a focus of ongoing work.
The integration of multimodal perception, symbolic and subsymbolic inference, and episodic memory raises architectural and computational challenges that warrant further exploration.

Open research questions relate to the scalability and generalizability of mind modeling across domains, the optimization of uncertainty quantification and revision strategies, and the user-facing management of privacy and explainability in real-world deployment.

Conclusion

This paper advances a principled Theory-of-Mind-based paradigm for user modeling in adaptive, interactive AI systems. By formalizing mind modeling as a process of explicit, revisable mental-state attribution, and encapsulating the process in the M3 framework, the work shifts the locus of personalization from behavioral prediction to cognitive interpretation. The proposed approach yields significant potential for enhancing the transparency, coherence, and social appropriateness of AI agents in longitudinal and embodied contexts. Future empirical research will be required to assess the practical benefits and challenges of this paradigm in deployed systems.

Markdown Report Issue