Papers
Topics
Authors
Recent
Search
2000 character limit reached

PandaAI: A Practical Agent CQ2 for Neuro-symbolic Data Analysis And Integrated Decision-Making in Quantitative Finance

Published 5 Jun 2026 in cs.LG, cs.AI, and q-fin.ST | (2606.06823v1)

Abstract: While deep learning has excelled in various domains, its application to sequential decision-making in finance remains challenging due to the low Signal-to-Noise Ratio (SNR) and non-stationarity of financial data. Leveraging the reasoning capabilities of LLMs, we propose \textbf{PandaAI}, a closed-loop neuro-symbolic LLM agent with market regime modeling and constrained alpha generation, which bridges general LLM reasoning with financial rigor and suppresses the financial toxicity of LLM-generated outputs. To bridge the gap between general linguistic capability and financial rigor, we fine-tune a domain-specific LLM. Furthermore, we integrate this LLM into a modular architecture and form a closed-loop system. Unlike traditional models that optimize isolated prediction metrics, \textbf{PandaAI} is designed as a neuro-symbolic agent that navigates the complex, real-world financial environment with explicit risk awareness. Extensive experiments on CSI 300 stock data show that \textbf{PandaAI} achieves a $18.2\%$ higher Rank IC and $25.7\%$ lower maximum drawdown than state-of-the-art time-series models. Our constrained LLM generation and dual-channel adaptation method provide a general paradigm for LLM deployment in high-stakes sequential decision-making scenarios.

Authors (3)

Summary

  • The paper presents a closed-loop neuro-symbolic LLM agent that explicitly models market regimes to enhance risk and portfolio optimization.
  • It uses an LLM-guided Monte Carlo Tree Search with enforced symbolic constraints to mine formulaic alphas under strict financial conditions.
  • Empirical results demonstrate superior performance over traditional methods, achieving higher predictive metrics and reduced drawdowns.

PandaAI: Neuro-Symbolic LLM Agent for Market-Aware Quantitative Finance

Introduction

This work introduces PandaAI, a closed-loop neuro-symbolic LLM agent targeting the unique challenges of quantitative finance, characterized by low signal-to-noise ratio (SNR) and pronounced non-stationarity in market data. The framework connects the general reasoning capacity of domain-specialized LLMs with financial rigor, emphasizing market regime modeling and hard-constrained generation of formulaic alphas for sequential decision-making. PandaAI’s design enables explicit market awareness, robust constrained alpha discovery, and continuous lifecycle adaptation via feedback across six coordinated modules. Figure 1

Figure 1: Overview of the PandaAI Market-Aware Quantitative Framework as a closed-loop dynamical system spanning market regime state estimation, LLM-guided alpha research, portfolio optimization, execution control, and feedback-driven updates.

Framework and Core Methodology

Market Dynamics Module: Continuous Regime Modeling

PandaAI eschews static stationarity assumptions by explicitly modeling the financial environment as a trajectory on a latent regime manifold ztz_t. High-dimensional style and industry risk factors (Barra) are compressed using an autoencoder, yielding ztz_t as a latent summary of current market dynamics.

A dual-channel adaptation mechanism allows this ztz_t representation to propagate:

  • A symbolic adapter projects ztz_t to soft tokens for LLM input, achieving context-sensitive reasoning.
  • A numerical adapter extracts regime-dependent control parameters for downstream modules:
    • Risk aversion λ(zt)\lambda(z_t) for portfolio optimization
    • Liquidity-sensitive weights for execution

This unification ensures regime awareness pervades both symbolic and numerical stages, supporting the Contextualization Hypothesis (H1).

Constrained Alpha Mining via LLM-Guided MCTS

Unlike generic code generation, formulaic alpha discovery is formulated as constrained search over a symbolic operator DAG subjected to first-class financial constraints C\mathcal{C}.

The Alpha Research Module leverages an LLM-guided Monte Carlo Tree Search (MCTS) pipeline:

  • Static syntax rules GforbiddenG_{\text{forbidden}} implement a pre-simulation filter.
  • Risk and domain constraints are dynamically injected into prompts and enforced via simulation penalties.
  • Candidate factors are subjected to explicit backtesting and reward model scoring, including RLHF-aligned metrics.

The constraint set C\mathcal{C} operates both as a prompt-level intrinsic regularizer and as a penalty schedule during simulation and evaluation. This framework directly operationalizes the Constrained-Creativity Hypothesis (H2).

Fine-Tuning and Execution-Grounded RLHF

The CQ2 chain-of-thought LLM is built by fine-tuning a DeepSeek-Coder-33B backbone on regime-labeled financial instruction sets, contextually conditioning reasoning on ztz_t. SFT and RLHF are combined to align generative output with user (trader) preferences and execution success. KL-regularized PPO stabilizes RLHF policy optimization.

Closed-Loop Update Mechanism

The Update Operator (U\mathcal{U}) implements a dual-timescale meta-adaptation protocol (H3):

  • Fast loop: LLM-driven induction of symbolic constraints upon detection of pathological failure clusters in backtest evidence; logic rules are immediately fused into ztz_t0.
  • Slow loop: Parametric fine-tuning (LoRA) on long-term successful trajectories ensures ongoing adaptation to non-stationary market evolution while mitigating catastrophic forgetting.

Experimental Evaluation

The framework is empirically validated on the CSI 300 equities universe with strict anti-leakage partitioning. Transaction costs and daily turnover capping are imposed to bridge the simulation-to-execution gap and suppress financial toxicity.

Key Out-of-Sample Results:

  • PandaAI achieves a Rank IC 18.2% above state-of-the-art NN baselines and 25.7% lower maximum drawdown.
  • The full system delivered an annualized return (AR) of 19.0% with MDD of -44.8%, outperforming LSTM, Transformer, and StockMixer in both predictive and risk-adjusted metrics. Figure 2

    Figure 3: Ablation study on the Contextualization Hypothesis (H1), demonstrating the performance impact across IC, Rank IC, AR, MDD, and ICIR when fine-tuning and explicit regime conditioning are omitted or ablated.

Ablation studies support all three hypotheses:

  • Removing fine-tuning or ztz_t1 injection degrades factor robustness and predictive performance.
  • Disabling constraints for LLM-guided search produces factors with superficially high ICIR but untradeably high turnover—demonstrating severe financial toxicity once costs are imposed.
  • Without the closed-loop adaptation, mined factors experience rapid performance decay.

Implications and Future Directions

Theoretical Advances

By formulating financial decision-making as a closed-loop, market-aware neuro-symbolic process, PandaAI extends the paradigm of LLM deployment from passive prediction to dynamic reasoning agents capable of explicit adaptation under non-stationarity. The dual-channel latent adapter and integration of RLHF-aligned constraints constitute a systematic approach to high-stakes sequential reasoning under domain constraints, providing a generalizable template for neuro-symbolic agents in other non-stationary, safety-critical domains.

Practical Impacts

The experimental protocol demonstrates that unconstrained LLM-generated financial signals exhibit prohibitive execution risk, reinforcing the necessity of integrating hard domain logic within the generative process. The lifecycle feedback mechanism enables real-time market adaptation, a critical property for practical systematic trading and fully autonomous market agents.

Pathways for Future Research

Open extensions suggested by this framework include:

  • Transfer of closed-loop, market-aware LLM agents to other non-stationary, high-risk verticals (energy, logistics).
  • Investigation of continual learning schemas (beyond LoRA) to further ameliorate catastrophic forgetting.
  • Enhanced hierarchical control via modular, multi-agent structures for portfolio construction, risk, and execution.
  • Expansion to multimodal architectures for fusing text, temporal, and visual information.

Conclusion

PandaAI establishes a holistic market-aware neuro-symbolic framework that tightly integrates domain-specific LLM reasoning, structured constraint enforcement, and regime-driven feedback adaptation for quantitative finance. The architecture demonstrates superior empirical performance over deep learning and recent LLM-based approaches, while addressing the central challenges of non-stationarity and simulation-to-execution realism. This research provides a robust methodological and architectural foundation for future neuro-symbolic autonomous agents in finance and related complex, non-stationary systems (2606.06823).

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Collections

Sign up for free to add this paper to one or more collections.