Disentangling Dynamical Systems: Causal Representation Learning Meets Local Sparse Attention

Published 15 Mar 2026 in cs.LG | (2603.14483v1)

Abstract: Parametric system identification methods estimate the parameters of explicitly defined physical systems from data. Yet, they remain constrained by the need to provide an explicit function space, typically through a predefined library of candidate functions chosen via available domain knowledge. In contrast, deep learning can demonstrably model systems of broad complexity with high fidelity, but black-box function approximation typically fails to yield explicit descriptive or disentangled representations revealing the structure of a system. We develop a novel identifiability theorem, leveraging causal representation learning, to uncover disentangled representations of system parameters without structural assumptions. We derive a graphical criterion specifying when system parameters can be uniquely disentangled from raw trajectory data, up to permutation and diffeomorphism. Crucially, our analysis demonstrates that global causal structures provide a lower bound on the disentanglement guarantees achievable when considering local state-dependent causal structures. We instantiate system parameter identification as a variational inference problem, leveraging a sparsity-regularised transformer to uncover state-dependent causal structures. We empirically validate our approach across four synthetic domains, demonstrating its ability to recover highly disentangled representations that baselines fail to recover. Corroborating our theoretical analysis, our results confirm that enforcing local causal structure is often necessary for full identifiability.

Abstract PDF Upgrade to Chat

Authors (4)

Summary

The paper introduces a sparsity-regularized transformer-based VAE that disentangles system parameters without predefined candidate functions.
It leverages causal representation learning to enforce local state-dependent causal relations, validated across multiple synthetic environments.
The methodology achieves robust identifiability, outperforming baselines and offering insights for improved dynamical system modeling.

Disentangling Dynamical Systems: Causal Representation Learning Meets Local Sparse Attention

Abstract and Introduction

The paper "Disentangling Dynamical Systems: Causal Representation Learning Meets Local Sparse Attention" (2603.14483) introduces a methodology aimed at addressing the limitations of classical parametric system identification methods which demand predefined libraries of candidate functions informed by domain knowledge. The authors leverage causal representation learning to uncover disentangled representations of system parameters without imposing such structural assumptions. This approach is instantiated through a variational inference problem, employing a sparsity-regularized transformer architecture to reveal state-dependent causal structures. Empirical validation across multiple synthetic domains suggests that the proposed method consistently isolates system parameters more robustly than baseline methods.

Theory and Identifiability

The authors present a novel identifiability theorem that extends mechanism sparsity principles from non-linear ICA to dynamical systems. The primary contribution is the derivation of a graphical criterion specifying when system parameters can be uniquely disentangled from raw trajectory data, up to permutation and diffeomorphism. They argue that enforcing sparse causal relations between parameters and system components in the decoder provably disentangles the system parameter representation.

Figure 1: High-level overview of the developed theory. An observed trajectory (left) is encoded into a vector of latent system parameters (marked in dark blue). The developed theory shows that enforcing sparse causal relations between parameters and system components in the decoder (which performs one-step prediction) provably disentangles the system parameter representation.

Methodology

The paper operationalizes the identifiability theory through a practical algorithm based on a sparsity-regularized, VAE-style representation learning model. This model encodes observed trajectories into latent parameters and decodes them to reconstruct future trajectories. The key element is the use of a transformer architecture designed to learn local, state-dependent causal graphs that reflect the fine-grained causal influences between system components.

Empirical Validation

The authors validate their approach across four synthetic domains: Dual Particle, Local Particle, Springs, and Bounce. In these environments, different combinations of causal structures are incorporated, testing the limits of both global and local graphical criteria for disentanglement.

Figure 2: Comparison of disentanglement across the test environments, where an MCC of 1.0 represents perfect disentanglement. All trials are repeated over eight random seeds. Box plots display the minimum, lower quartile, upper quartile, and maximum values. The validation reconstruction loss is shown at the bottom, indicating that all models are approximately equiperformant in these environments. The VCD baseline, which learns static graphs, strongly disentangles in the first two environments, which satisfies the global graph criterion. In contrast, only SPARTAN, which learns state-dependent graphs, consistently disentangles in all environments.

The results confirm that enforcing local causal structure is often necessary for full identifiability, with the SPARTAN model outperforming other baselines in achieving disentangled representations.

Discussion and Implications

This paper contributes to the broader understanding of causal representation learning, linking it explicitly to system identification in dynamical systems. The practical implications are significant: robust identification of system parameters from trajectory data can lead to improved modeling and control in applications ranging from robotics to climate modeling. Theoretically, this work opens up new avenues in studying local causal structures and sparse attention mechanisms in neural networks.

Conclusion

The authors underscore that their work addresses fundamental limitations in classical system identification, providing a novel approach that relaxes structural assumptions and leverages causal dependencies in data. The insights gained from this research are poised to impact various fields requiring dynamical modeling, fostering future exploration into sparse causal models and expanding the scope of identifiability in unsupervised learning contexts.

In summary, the integration of causal representation learning with dynamical systems promises to enhance model interpretability and fidelity, positioning this methodology as a promising alternative to traditional approaches that rely heavily on domain knowledge.

Markdown Report Issue