- The paper's main contribution is introducing relational probing, a method that transforms LM hidden states into adjacency matrices for graph-based financial prediction.
- It employs a relation head jointly trained with a graph attention network to capture dynamic, news-based stock relations, outperforming traditional co-occurrence methods.
- The study demonstrates scalability across SLM sizes and highlights efficiency gains by avoiding autoregressive decoding while addressing class imbalance in financial trend prediction.
Relational Probing for Financial Market Prediction: Direct LM-to-Graph Adaptation
Motivation and Context
Recent financial trend prediction (FTP) research emphasizes the importance of modeling inter-stock relationships. Conventional approaches employ predefined, often static, relational graphs constructed from sector data, ownership structures, or price correlations. These static structures quickly become obsolete in volatile markets and introduce considerable noise, undermining predictive performance. With the proliferation of financial news, dynamic graph construction using real-time news sources has become desirable, yet scaling reliable relationship induction remains a challenge due to noisy and unstructured text.
LLMs have transformed relation extraction (RE), offering flexible, task-agnostic methods for identifying entity interactions from raw text. However, mainstream LLM-based pipelines rely on autoregressive decoding for structured output, such as generating JSON-encoded edges, which incurs high latency, repeated parsing, and detachment from downstream optimization. Furthermore, prompt-based approaches are brittle under complex, multi-level constraints, failing to enforce strict relational structures needed by graph-based models.
Proposed Method: Relational Probing
The paper introduces Relational Probing, an LM-to-graph adaptation paradigm that directly couples a LLM with a graph attention network (GAT) for end-to-end FTP. The key technical innovation is a relation head: a lightweight, differentiable module that transforms LM hidden states into adjacency matrices, effectively inducing relational graphs without constrained decoding or posthoc parsing. The relation head is jointly trained with the GAT using downstream prediction loss, aligning the induced relations with the actual predictive utility for stock movement.
Architecture Overview
- News Encoding: Each news article is tokenized and encoded using Qwen3 SLM backbones, producing contextual token representations.
- Relation Head (RH): Ticker embeddings attend to hidden states via scaled dot-product attention, producing news-conditioned ticker representations. Pairwise ticker interactions are scored using a bilinear map, resulting in a real-valued interaction matrix.
- Daily Graph Aggregation: News-level interaction matrices are aggregated and thresholded to form daily, weighted, sparse adjacency matrices over the ticker set.
- Downstream Prediction: Node features (time-series price/volume signals encoded via LSTM) are processed by a GAT using the induced graphs to predict next-step stock trends.
The joint training setup promotes alignment between relation induction and predictive performance, propagating supervisory gradients from the GAT through the RH and indirectly into the SLM.
Experimental Methodology
The authors operationalize small LLMs (SLMs) as models amenable to end-to-end fine-tuning (including backbone, RH, and GAT) on a single 24GB GPU, facilitating reproducibility and reference to defined compute budgets. The Qwen3 family (0.6B, 1.7B, 4B) is used to provide architectural homogeneity across model scales.
Experiments are conducted on a financial news dataset with labeled tickers, adopting a three-class prediction regime (down, unchanged, up), in alignment with standard investment risk management. To ensure fair comparison, all runs fix the downstream GAT, optimization settings, and data splits.
Numerical Results and Analysis
Relational probing consistently outperforms a news co-occurrence baseline used for graph induction. For the Qwen3-4B backbone, key results include:
| Method |
Macro F1 |
MCC |
AUC |
| Co-occurrence |
0.2831 |
0.0143 |
0.5232 |
| Qwen3-0.6B |
0.3171 |
0.0312 |
0.5488 |
| Qwen3-1.7B |
0.3221 |
0.0435 |
0.5513 |
| Qwen3-4B |
0.3272 |
0.0562 |
0.5571 |
Noteworthy findings:
- Consistent, task-relevant improvements: Relational probing increases macro F1 and MCC relative to co-occurrence, indicating better minority class discrimination and overall relational signal quality, despite small absolute values due to class imbalance and noise prevalent in FTP.
- Scaling with model capacity with diminishing returns: Macro F1 and MCC improve as SLM size grows (from 0.6B to 4B), though gains decelerate at the higher end, suggesting that most predictive semantics are already captured with modest SLM scale.
- Accuracy as a misleading metric: Due to strong class imbalance, even trivial (majority-only) models superficially show high accuracy, but fail on key class-aware metrics.
- Practicality and efficiency: Relational probing avoids runtime penalties of autoregressive decoding, substantially decreasing inference time compared to full input-plus-generated token regimes (which confer minimal improvement).
Ablation: Relation Head Architectures
Three RH variants are evaluated:
- Full (full-range attention): Default, allowing all ticker embeddings to attend to all tokens in news text.
- Limited: Restricts attention only to mentioned tickers within each article.
- Pooling: Uses average-pooled LM representations.
Results unambiguously favor full-range attention, with both full and limited outperforming pooling, and full attention achieving the best macro F1 and MCC. Theoretical analysis supports the empirical findings: restricting ticker scope under-exploits available semantic information, while pooling discards relational specificity.
Evaluation of input-only versus input-plus-generated hidden states demonstrates negligible value from including model generations, with significant computational penalty, justifying the default input-only configuration.
Implications, Limitations, and Future Directions
Relational probing introduces a principled method for deriving semantically grounded, dynamic financial graphs directly from LM hidden states, obviating brittle prompt engineering and manual structure enforcement. Boldly, the authors claim that their method reliably outperforms traditional co-occurrence-derived graphs on downstream FTP metrics. The study also illuminates the diminishing returns of scale for SLMs in this context: smaller models, if equipped with task-aligned adapters, can yield competitive downstream results, opening the way for efficient, practical deployment in latency- or resource-constrained scenarios.
The method, however, does not in itself resolve the stubborn issue of class imbalance in real datasets—a challenge the authors acknowledge and propose to address with calibrated loss formulations. While relational probing demonstrably enhances the quality of relational structure available to the GAT, ultimate gains in predictive performance may require concurrent advances in class-aware training dynamics.
The formalization of SLMs for financial NLP tasks introduces a useful reproducibility benchmark. Prospective research could extend this modularity to alternative task-aligned heads for generalized structured prediction and to further integration of multimodal and event-based data streams for more holistic market modeling.
Conclusion
This work presents a methodologically rigorous, practically effective approach to unifying LM-driven relation induction and graph-based financial prediction. By constraining LM outputs to deliver structured, GNN-ready graphs optimized for downstream predictive performance, relational probing sets a direction for scalable, efficient, and context-adaptive financial modeling. Continued exploration is warranted into more flexible adapter heads, improved handling of data skew, and broader applications beyond finance, wherever text-driven relational induction and graph reasoning intersect.