- The paper presents a closed-loop neuro-symbolic LLM agent that explicitly models market regimes to enhance risk and portfolio optimization.
- It uses an LLM-guided Monte Carlo Tree Search with enforced symbolic constraints to mine formulaic alphas under strict financial conditions.
- Empirical results demonstrate superior performance over traditional methods, achieving higher predictive metrics and reduced drawdowns.
PandaAI: Neuro-Symbolic LLM Agent for Market-Aware Quantitative Finance
Introduction
This work introduces PandaAI, a closed-loop neuro-symbolic LLM agent targeting the unique challenges of quantitative finance, characterized by low signal-to-noise ratio (SNR) and pronounced non-stationarity in market data. The framework connects the general reasoning capacity of domain-specialized LLMs with financial rigor, emphasizing market regime modeling and hard-constrained generation of formulaic alphas for sequential decision-making. PandaAI’s design enables explicit market awareness, robust constrained alpha discovery, and continuous lifecycle adaptation via feedback across six coordinated modules.
Figure 1: Overview of the PandaAI Market-Aware Quantitative Framework as a closed-loop dynamical system spanning market regime state estimation, LLM-guided alpha research, portfolio optimization, execution control, and feedback-driven updates.
Framework and Core Methodology
Market Dynamics Module: Continuous Regime Modeling
PandaAI eschews static stationarity assumptions by explicitly modeling the financial environment as a trajectory on a latent regime manifold zt​. High-dimensional style and industry risk factors (Barra) are compressed using an autoencoder, yielding zt​ as a latent summary of current market dynamics.
A dual-channel adaptation mechanism allows this zt​ representation to propagate:
- A symbolic adapter projects zt​ to soft tokens for LLM input, achieving context-sensitive reasoning.
- A numerical adapter extracts regime-dependent control parameters for downstream modules:
- Risk aversion λ(zt​) for portfolio optimization
- Liquidity-sensitive weights for execution
This unification ensures regime awareness pervades both symbolic and numerical stages, supporting the Contextualization Hypothesis (H1).
Constrained Alpha Mining via LLM-Guided MCTS
Unlike generic code generation, formulaic alpha discovery is formulated as constrained search over a symbolic operator DAG subjected to first-class financial constraints C.
The Alpha Research Module leverages an LLM-guided Monte Carlo Tree Search (MCTS) pipeline:
- Static syntax rules Gforbidden​ implement a pre-simulation filter.
- Risk and domain constraints are dynamically injected into prompts and enforced via simulation penalties.
- Candidate factors are subjected to explicit backtesting and reward model scoring, including RLHF-aligned metrics.
The constraint set C operates both as a prompt-level intrinsic regularizer and as a penalty schedule during simulation and evaluation. This framework directly operationalizes the Constrained-Creativity Hypothesis (H2).
Fine-Tuning and Execution-Grounded RLHF
The CQ2 chain-of-thought LLM is built by fine-tuning a DeepSeek-Coder-33B backbone on regime-labeled financial instruction sets, contextually conditioning reasoning on zt​. SFT and RLHF are combined to align generative output with user (trader) preferences and execution success. KL-regularized PPO stabilizes RLHF policy optimization.
Closed-Loop Update Mechanism
The Update Operator (U) implements a dual-timescale meta-adaptation protocol (H3):
- Fast loop: LLM-driven induction of symbolic constraints upon detection of pathological failure clusters in backtest evidence; logic rules are immediately fused into zt​0.
- Slow loop: Parametric fine-tuning (LoRA) on long-term successful trajectories ensures ongoing adaptation to non-stationary market evolution while mitigating catastrophic forgetting.
Experimental Evaluation
The framework is empirically validated on the CSI 300 equities universe with strict anti-leakage partitioning. Transaction costs and daily turnover capping are imposed to bridge the simulation-to-execution gap and suppress financial toxicity.
Key Out-of-Sample Results:
- PandaAI achieves a Rank IC 18.2% above state-of-the-art NN baselines and 25.7% lower maximum drawdown.
- The full system delivered an annualized return (AR) of 19.0% with MDD of -44.8%, outperforming LSTM, Transformer, and StockMixer in both predictive and risk-adjusted metrics.
Figure 3: Ablation study on the Contextualization Hypothesis (H1), demonstrating the performance impact across IC, Rank IC, AR, MDD, and ICIR when fine-tuning and explicit regime conditioning are omitted or ablated.
Ablation studies support all three hypotheses:
- Removing fine-tuning or zt​1 injection degrades factor robustness and predictive performance.
- Disabling constraints for LLM-guided search produces factors with superficially high ICIR but untradeably high turnover—demonstrating severe financial toxicity once costs are imposed.
- Without the closed-loop adaptation, mined factors experience rapid performance decay.
Implications and Future Directions
Theoretical Advances
By formulating financial decision-making as a closed-loop, market-aware neuro-symbolic process, PandaAI extends the paradigm of LLM deployment from passive prediction to dynamic reasoning agents capable of explicit adaptation under non-stationarity. The dual-channel latent adapter and integration of RLHF-aligned constraints constitute a systematic approach to high-stakes sequential reasoning under domain constraints, providing a generalizable template for neuro-symbolic agents in other non-stationary, safety-critical domains.
Practical Impacts
The experimental protocol demonstrates that unconstrained LLM-generated financial signals exhibit prohibitive execution risk, reinforcing the necessity of integrating hard domain logic within the generative process. The lifecycle feedback mechanism enables real-time market adaptation, a critical property for practical systematic trading and fully autonomous market agents.
Pathways for Future Research
Open extensions suggested by this framework include:
- Transfer of closed-loop, market-aware LLM agents to other non-stationary, high-risk verticals (energy, logistics).
- Investigation of continual learning schemas (beyond LoRA) to further ameliorate catastrophic forgetting.
- Enhanced hierarchical control via modular, multi-agent structures for portfolio construction, risk, and execution.
- Expansion to multimodal architectures for fusing text, temporal, and visual information.
Conclusion
PandaAI establishes a holistic market-aware neuro-symbolic framework that tightly integrates domain-specific LLM reasoning, structured constraint enforcement, and regime-driven feedback adaptation for quantitative finance. The architecture demonstrates superior empirical performance over deep learning and recent LLM-based approaches, while addressing the central challenges of non-stationarity and simulation-to-execution realism. This research provides a robust methodological and architectural foundation for future neuro-symbolic autonomous agents in finance and related complex, non-stationary systems (2606.06823).