An ALE-Consistent Graph Neural Operator-Transformer Framework for Fluid-Structure Interaction

Published 1 May 2026 in physics.flu-dyn and cs.LG | (2605.00937v1)

Abstract: We propose an arbitrary Lagrangian-Eulerian (ALE)-consistent machine learning framework for long-term fluid-structure interaction (FSI) prediction on deforming unstructured meshes. Specifically, the fluid dynamics are modeled by a surrogate that combines a graph neural operator (GNO) with a vision Transformer (ViT) for spatiotemporal prediction, while a lightweight long short-term memory (LSTM) network predicts structural kinematics at the interface. The two surrogates are coupled through a standard partitioned procedure. Most importantly, kinematic compatibility at the moving interface is enforced via an ALE-consistent boundary-correction step that updates the fluid-side interface velocity with the predicted structural velocity at each coupling update, thereby improving near-interface accuracy and long-term rollout stability. To mitigate autoregressive error accumulation, a two-stage training strategy is adopted, consisting of single-step supervised pretraining followed by long-term autoregressive fine-tuning. The proposed framework is validated on the benchmark problem of a flexible beam vibration in the wake of a cylinder. Results demonstrate accurate phase-consistent predictions over long rollouts and robust generalization under inlet-profile variations in both interpolation and extrapolation settings. Systematic ablation studies further assess the respective contributions of the ViT module, ALE-consistent boundary correction, and long-term training to predictive accuracy and rollout robustness.

Abstract PDF Upgrade to Chat

Authors (5)

Summary

The paper presents an ALE-consistent graph neural operator-transformer framework that ensures accurate long-horizon predictions for fluid-structure interaction.
It integrates a Vision Transformer for global temporal modeling and an LSTM for interfacing kinematics, achieving strict mesh-aware consistency.
The two-stage training protocol effectively mitigates phase drift and error accumulation, resulting in robust performance across periodic and transient regimes.

ALE-Consistent GNO-Transformer Framework for Fluid-Structure Interaction: An Expert Synthesis

Introduction: Context and Motivation

Fluid-structure interaction (FSI) is fundamental in multi-physics modeling across ocean, aerospace, and biomedical engineering, where high-fidelity FSI simulations remain prohibitively expensive in time-critical settings due to complex two-way coupling, interface deformation, and added-mass effects. Classical high-order arbitrary Lagrangian-Eulerian (ALE) methods and partitioned solver strategies guarantee interface accuracy and coupling stability but induce significant computational overhead, especially for highly deformable configurations with unstructured meshes. Traditional reduced-order models (ROMs) deliver computational savings but often trade off global fidelity and robust generalization across strongly nonlinear transient regimes. Recent advances in deep learning have enabled direct, data-driven surrogate modeling for FSI, but suffered from error accumulation, phase drift, and lack of geometric consistency at deforming interfaces—particularly acute for long-term, multistep prediction and on unstructured mesh domains.

Proposed Framework: Model Architecture and ALE-Consistent Coupling

The proposed framework, GNO-ViT, integrates a graph neural operator (GNO) with a vision Transformer (ViT) for fluid modeling on unstructured, deforming meshes and a lightweight LSTM for structural interfacial kinematics. Central to the approach is an ALE-consistent partitioned scheme, where mesh deformation is explicitly treated via remeshing driven by structural displacements, preserving node-to-node interface correspondence and geometric consistency.

The surrogate system operates in a staggered coupling: the fluid surrogate predicts the fluid state on a remeshed grid based on the latest structural interface configuration, followed by an ALE-consistent boundary correction enforcing equality between predicted structural and fluid interface velocities. This corrects slip errors, maintains kinematic compatibility, and stabilizes pressure and shear predictions along the moving boundary. Interfacial pressures are subsequently transferred to drive the next structural update, closing the coupled loop.

Figure 1: Schematic of the flexible beam clamped behind a rigid cylinder and immersed in channel flow, depicting the FSI benchmark geometry.

Figure 2: Overview and detailed schematic of the ALE-consistent GNO-Transformer framework, showing staggered rollout and step-wise, supervised error control for stable long-term prediction.

The GNO-ViT fluid surrogate sequence consists of: (1) a GNO lifting unstructured nodal features (state and mesh deformations) to a structured tensor representation; (2) patch-based ViT temporal modeling for global spatio-temporal dependencies and non-local memory; (3) a projection GNO mapping the output tensor back to the unstructured mesh compatible with dynamic ALE boundaries.

Two-Stage Training: Mitigating Rollout Instability

A core innovation is the two-stage training protocol to mitigate error accumulation. Stage I uses single-step supervised pretraining (“teacher forcing”) for local prediction accuracy. Stage II consists of joint, multi-step autoregressive fine-tuning under the closed ALE-coupled loop, explicitly exposing the model to its prediction errors, propagating through the FSI partitioned rollout. This strategy demonstrably suppresses phase drift and instability across long rollout horizons, which are pervasive in sequential prediction with error recursion.

Numerical Results: Predictive Accuracy, Generalization, and Ablations

The model was validated on the canonical Turek–Hron FSI cylinder–beam benchmark, for both periodic and strongly transient (non-periodic) regimes, with data generated via OpenFOAM and finite-element solvers and projected onto coarse unstructured meshes for surrogate efficiency.

Long-Horizon FSI Prediction: Non-Periodic to Periodic Regimes

The GNO-ViT framework exhibits high-fidelity match to ground truth in both non-periodic transient and periodic regimes, accurately capturing the evolution of structural displacements and flow vortical structures over long rollouts, including generalization to out-of-sample inlet conditions.

Figure 3: Predicted versus reference time histories for normalized transverse displacement at the beam tip under non-periodic (interpolation and extrapolation) conditions.

Figure 4: Comparison of predicted and ground truth $u_f$ velocity fields at key transient stages in the non-periodic regime.

Figure 5: Predicted and true $x/d$ and $y/d$ displacements in the periodic regime for in-sample and extrapolative inlet profiles.

Figure 6: Fluid field snapshot for periodic regime ( $u_f$ , $v_f$ , $p_f$ ) highlighting robust phase-locked predictions.

Ablation Studies: GNO vs. GNO-ViT and Boundary Correction

Ablations unequivocally show:

Removing ViT (i.e., using pure GNO) produces good single-step regression but rapid coherence loss and phase drift in multi-step rollouts.
Omitting ALE-consistent boundary correction (GNO-ViT-noBM) leads to boundary slip, significant near-wall pressure error, and propagation of systematic bias even if global field metrics remain superficially high.
Figure 7: Rollout comparison for $u_f$ over time among GNO-ViT, GNO, and GNO-ViT-noBM, showing coherent evolution only in the ALE-consistent, ViT-enabled architecture.

Figure 8: Correlation coefficient $R^2$ for $u_f$ and $p_f$ over rollout: GNO-ViT maintains $x/d$ 0; GNO and no-boundary-correction models degrade progressively.

Figure 9: Boundary pressure predictions along the beam by GNO-ViT, GNO, and GNO-ViT-noBM demonstrating marked error amplification in the absence of boundary constraint.

Figure 10: Spatiotemporal error maps for boundary pressure, revealing phase drift and persistent bias without global attention or interface enforcement.

Figure 11: Predicted boundary velocities ( $x/d$ 1, $x/d$ 2) and their distributions, illustrating that ALE-consistent correction suppresses noise and enforces kinematic compatibility.

Long-Term vs. Short-Term Training

Short-term (single-step) training enables accurate prediction at short horizons but rapidly accumulates phase error, leading to temporal drift in both structural and flow variables; only long-term, multi-step supervised fine-tuning preserves physical coherence throughout extensive rollouts.

Figure 12: Short-term trained model’s drift over time for beam tip displacements under periodic regime, as compared to persistent accuracy with long-term training.

Figure 13: Time evolution of flow–field correlation $x/d$ 3 with and without long-term training, substantiating improved rollout robustness from autoregressive loss exposure.

Implications and Outlook

The results establish several key conclusions for data-driven FSI surrogate modeling:

Integration of ViT enables global temporal coherence not recoverable through local, message-passing GNNs alone.
Explicit interface correction in ALE ensures structural–fluid consistency, critical for near-wall shear prediction and for the stability of coupled dynamics.
Two-stage training—especially joint autoregressive fine-tuning—proves essential for suppressing compounding errors, a bottleneck in time-dependent FSI surrogate workflows.
The framework achieves robust generalization under interpolative and extrapolative boundary conditions, providing strong support for real-time digital twin, control, and uncertainty quantification applications beyond the periodic regime.

Methodologically, this demonstrates the criticality of mesh-aware, physics-consistent learning, rather than reliance on geometric homogenization or standard Eulerian grid projections. The pipeline is especially promising for future extension to 3D FSI, turbulent regimes, and hybrid physics-ML coupling for in-the-loop simulation and optimization.

Conclusion

The ALE-consistent GNO-Transformer framework constitutes a technically mature, mesh-agnostic solution for accurate and stable long-horizon FSI prediction on deforming, unstructured domains. It combines graph-based geometric awareness, global temporal modeling, strict interface consistency, and a resilient training protocol to overcome limitations of both ROMs and conventional black-box deep learning approaches. Limitations remain regarding scalability to 3D, additional multi-physical constraints, and further acceleration via integration with real-world high-fidelity CFD solvers, which are promising avenues for future research and application in real-time control, digital twin platforms, and next-generation data-driven FSI design.

Markdown Report Issue