PandaAI: A Practical Agent CQ2 for Neuro-symbolic Data Analysis And Integrated Decision-Making in Quantitative Finance

Published 5 Jun 2026 in cs.LG, cs.AI, and q-fin.ST | (2606.06823v1)

Abstract: While deep learning has excelled in various domains, its application to sequential decision-making in finance remains challenging due to the low Signal-to-Noise Ratio (SNR) and non-stationarity of financial data. Leveraging the reasoning capabilities of LLMs, we propose \textbf{PandaAI}, a closed-loop neuro-symbolic LLM agent with market regime modeling and constrained alpha generation, which bridges general LLM reasoning with financial rigor and suppresses the financial toxicity of LLM-generated outputs. To bridge the gap between general linguistic capability and financial rigor, we fine-tune a domain-specific LLM. Furthermore, we integrate this LLM into a modular architecture and form a closed-loop system. Unlike traditional models that optimize isolated prediction metrics, \textbf{PandaAI} is designed as a neuro-symbolic agent that navigates the complex, real-world financial environment with explicit risk awareness. Extensive experiments on CSI 300 stock data show that \textbf{PandaAI} achieves a $18.2\%$ higher Rank IC and $25.7\%$ lower maximum drawdown than state-of-the-art time-series models. Our constrained LLM generation and dual-channel adaptation method provide a general paradigm for LLM deployment in high-stakes sequential decision-making scenarios.

Abstract PDF Upgrade to Chat

Authors (3)

Summary

The paper presents a closed-loop neuro-symbolic LLM agent that explicitly models market regimes to enhance risk and portfolio optimization.
It uses an LLM-guided Monte Carlo Tree Search with enforced symbolic constraints to mine formulaic alphas under strict financial conditions.
Empirical results demonstrate superior performance over traditional methods, achieving higher predictive metrics and reduced drawdowns.

PandaAI: Neuro-Symbolic LLM Agent for Market-Aware Quantitative Finance

Introduction

This work introduces PandaAI, a closed-loop neuro-symbolic LLM agent targeting the unique challenges of quantitative finance, characterized by low signal-to-noise ratio (SNR) and pronounced non-stationarity in market data. The framework connects the general reasoning capacity of domain-specialized LLMs with financial rigor, emphasizing market regime modeling and hard-constrained generation of formulaic alphas for sequential decision-making. PandaAI’s design enables explicit market awareness, robust constrained alpha discovery, and continuous lifecycle adaptation via feedback across six coordinated modules.

Figure 1: Overview of the PandaAI Market-Aware Quantitative Framework as a closed-loop dynamical system spanning market regime state estimation, LLM-guided alpha research, portfolio optimization, execution control, and feedback-driven updates.

Framework and Core Methodology

Market Dynamics Module: Continuous Regime Modeling

PandaAI eschews static stationarity assumptions by explicitly modeling the financial environment as a trajectory on a latent regime manifold $z_t$ . High-dimensional style and industry risk factors (Barra) are compressed using an autoencoder, yielding $z_t$ as a latent summary of current market dynamics.

A dual-channel adaptation mechanism allows this $z_t$ representation to propagate:

A symbolic adapter projects $z_t$ to soft tokens for LLM input, achieving context-sensitive reasoning.
A numerical adapter extracts regime-dependent control parameters for downstream modules:
- Risk aversion $\lambda(z_t)$ for portfolio optimization
- Liquidity-sensitive weights for execution

This unification ensures regime awareness pervades both symbolic and numerical stages, supporting the Contextualization Hypothesis (H1).

Constrained Alpha Mining via LLM-Guided MCTS

Unlike generic code generation, formulaic alpha discovery is formulated as constrained search over a symbolic operator DAG subjected to first-class financial constraints $\mathcal{C}$ .

The Alpha Research Module leverages an LLM-guided Monte Carlo Tree Search (MCTS) pipeline:

Static syntax rules $G_{\text{forbidden}}$ implement a pre-simulation filter.
Risk and domain constraints are dynamically injected into prompts and enforced via simulation penalties.
Candidate factors are subjected to explicit backtesting and reward model scoring, including RLHF-aligned metrics.

The constraint set $\mathcal{C}$ operates both as a prompt-level intrinsic regularizer and as a penalty schedule during simulation and evaluation. This framework directly operationalizes the Constrained-Creativity Hypothesis (H2).

Fine-Tuning and Execution-Grounded RLHF

The CQ2 chain-of-thought LLM is built by fine-tuning a DeepSeek-Coder-33B backbone on regime-labeled financial instruction sets, contextually conditioning reasoning on $z_t$ . SFT and RLHF are combined to align generative output with user (trader) preferences and execution success. KL-regularized PPO stabilizes RLHF policy optimization.

Closed-Loop Update Mechanism

The Update Operator ( $\mathcal{U}$ ) implements a dual-timescale meta-adaptation protocol (H3):

Fast loop: LLM-driven induction of symbolic constraints upon detection of pathological failure clusters in backtest evidence; logic rules are immediately fused into $z_t$ 0.
Slow loop: Parametric fine-tuning (LoRA) on long-term successful trajectories ensures ongoing adaptation to non-stationary market evolution while mitigating catastrophic forgetting.

Experimental Evaluation

The framework is empirically validated on the CSI 300 equities universe with strict anti-leakage partitioning. Transaction costs and daily turnover capping are imposed to bridge the simulation-to-execution gap and suppress financial toxicity.

Key Out-of-Sample Results:

PandaAI achieves a Rank IC 18.2% above state-of-the-art NN baselines and 25.7% lower maximum drawdown.
The full system delivered an annualized return (AR) of 19.0% with MDD of -44.8%, outperforming LSTM, Transformer, and StockMixer in both predictive and risk-adjusted metrics.
Figure 3: Ablation study on the Contextualization Hypothesis (H1), demonstrating the performance impact across IC, Rank IC, AR, MDD, and ICIR when fine-tuning and explicit regime conditioning are omitted or ablated.

Ablation studies support all three hypotheses:

Removing fine-tuning or $z_t$ 1 injection degrades factor robustness and predictive performance.
Disabling constraints for LLM-guided search produces factors with superficially high ICIR but untradeably high turnover—demonstrating severe financial toxicity once costs are imposed.
Without the closed-loop adaptation, mined factors experience rapid performance decay.

Implications and Future Directions

Theoretical Advances

By formulating financial decision-making as a closed-loop, market-aware neuro-symbolic process, PandaAI extends the paradigm of LLM deployment from passive prediction to dynamic reasoning agents capable of explicit adaptation under non-stationarity. The dual-channel latent adapter and integration of RLHF-aligned constraints constitute a systematic approach to high-stakes sequential reasoning under domain constraints, providing a generalizable template for neuro-symbolic agents in other non-stationary, safety-critical domains.

Practical Impacts

The experimental protocol demonstrates that unconstrained LLM-generated financial signals exhibit prohibitive execution risk, reinforcing the necessity of integrating hard domain logic within the generative process. The lifecycle feedback mechanism enables real-time market adaptation, a critical property for practical systematic trading and fully autonomous market agents.

Pathways for Future Research

Open extensions suggested by this framework include:

Transfer of closed-loop, market-aware LLM agents to other non-stationary, high-risk verticals (energy, logistics).
Investigation of continual learning schemas (beyond LoRA) to further ameliorate catastrophic forgetting.
Enhanced hierarchical control via modular, multi-agent structures for portfolio construction, risk, and execution.
Expansion to multimodal architectures for fusing text, temporal, and visual information.

Conclusion

PandaAI establishes a holistic market-aware neuro-symbolic framework that tightly integrates domain-specific LLM reasoning, structured constraint enforcement, and regime-driven feedback adaptation for quantitative finance. The architecture demonstrates superior empirical performance over deep learning and recent LLM-based approaches, while addressing the central challenges of non-stationarity and simulation-to-execution realism. This research provides a robust methodological and architectural foundation for future neuro-symbolic autonomous agents in finance and related complex, non-stationary systems (2606.06823).

Markdown Report Issue