Relational Probing: LM-to-Graph Adaptation for Financial Prediction

Published 11 Apr 2026 in cs.CL | (2604.10212v1)

Abstract: LLMs can be used to identify relationships between financial entities in text. However, while structured output mechanisms exist, prompting-based pipelines still incur autoregressive decoding costs and decouple graph construction from downstream optimization. We propose \emph{Relational Probing}, which replaces the standard language-model head with a relation head that induces a relational graph directly from language-model hidden states and is trained jointly with the downstream task model for stock-trend prediction. This approach both learns semantic representations and preserves the strict structure of the induced relational graph. It enables language-model outputs to go beyond text, allowing them to be reshaped into task-specific formats for downstream models. To enhance reproducibility, we provide an operational definition of small LLMs (SLMs): models that can be fine-tuned end-to-end on a single 24GB GPU under specified batch-size and sequence-length settings. Experiments use Qwen3 backbones (0.6B/1.7B/4B) as upstream SLMs and compare against a co-occurrence baseline. Relational Probing yields consistent performance improvements at competitive inference cost.

Abstract PDF Upgrade to Chat

Authors (4)

Summary

The paper's main contribution is introducing relational probing, a method that transforms LM hidden states into adjacency matrices for graph-based financial prediction.
It employs a relation head jointly trained with a graph attention network to capture dynamic, news-based stock relations, outperforming traditional co-occurrence methods.
The study demonstrates scalability across SLM sizes and highlights efficiency gains by avoiding autoregressive decoding while addressing class imbalance in financial trend prediction.

Relational Probing for Financial Market Prediction: Direct LM-to-Graph Adaptation

Motivation and Context

Recent financial trend prediction (FTP) research emphasizes the importance of modeling inter-stock relationships. Conventional approaches employ predefined, often static, relational graphs constructed from sector data, ownership structures, or price correlations. These static structures quickly become obsolete in volatile markets and introduce considerable noise, undermining predictive performance. With the proliferation of financial news, dynamic graph construction using real-time news sources has become desirable, yet scaling reliable relationship induction remains a challenge due to noisy and unstructured text.

LLMs have transformed relation extraction (RE), offering flexible, task-agnostic methods for identifying entity interactions from raw text. However, mainstream LLM-based pipelines rely on autoregressive decoding for structured output, such as generating JSON-encoded edges, which incurs high latency, repeated parsing, and detachment from downstream optimization. Furthermore, prompt-based approaches are brittle under complex, multi-level constraints, failing to enforce strict relational structures needed by graph-based models.

Proposed Method: Relational Probing

The paper introduces Relational Probing, an LM-to-graph adaptation paradigm that directly couples a LLM with a graph attention network (GAT) for end-to-end FTP. The key technical innovation is a relation head: a lightweight, differentiable module that transforms LM hidden states into adjacency matrices, effectively inducing relational graphs without constrained decoding or posthoc parsing. The relation head is jointly trained with the GAT using downstream prediction loss, aligning the induced relations with the actual predictive utility for stock movement.

Architecture Overview

News Encoding: Each news article is tokenized and encoded using Qwen3 SLM backbones, producing contextual token representations.
Relation Head (RH): Ticker embeddings attend to hidden states via scaled dot-product attention, producing news-conditioned ticker representations. Pairwise ticker interactions are scored using a bilinear map, resulting in a real-valued interaction matrix.
Daily Graph Aggregation: News-level interaction matrices are aggregated and thresholded to form daily, weighted, sparse adjacency matrices over the ticker set.
Downstream Prediction: Node features (time-series price/volume signals encoded via LSTM) are processed by a GAT using the induced graphs to predict next-step stock trends.

The joint training setup promotes alignment between relation induction and predictive performance, propagating supervisory gradients from the GAT through the RH and indirectly into the SLM.

Experimental Methodology

The authors operationalize small LLMs (SLMs) as models amenable to end-to-end fine-tuning (including backbone, RH, and GAT) on a single 24GB GPU, facilitating reproducibility and reference to defined compute budgets. The Qwen3 family (0.6B, 1.7B, 4B) is used to provide architectural homogeneity across model scales.

Experiments are conducted on a financial news dataset with labeled tickers, adopting a three-class prediction regime (down, unchanged, up), in alignment with standard investment risk management. To ensure fair comparison, all runs fix the downstream GAT, optimization settings, and data splits.

Numerical Results and Analysis

Relational probing consistently outperforms a news co-occurrence baseline used for graph induction. For the Qwen3-4B backbone, key results include:

Method	Macro F1	MCC	AUC
Co-occurrence	0.2831	0.0143	0.5232
Qwen3-0.6B	0.3171	0.0312	0.5488
Qwen3-1.7B	0.3221	0.0435	0.5513
Qwen3-4B	0.3272	0.0562	0.5571

Noteworthy findings:

Consistent, task-relevant improvements: Relational probing increases macro F1 and MCC relative to co-occurrence, indicating better minority class discrimination and overall relational signal quality, despite small absolute values due to class imbalance and noise prevalent in FTP.
Scaling with model capacity with diminishing returns: Macro F1 and MCC improve as SLM size grows (from 0.6B to 4B), though gains decelerate at the higher end, suggesting that most predictive semantics are already captured with modest SLM scale.
Accuracy as a misleading metric: Due to strong class imbalance, even trivial (majority-only) models superficially show high accuracy, but fail on key class-aware metrics.
Practicality and efficiency: Relational probing avoids runtime penalties of autoregressive decoding, substantially decreasing inference time compared to full input-plus-generated token regimes (which confer minimal improvement).

Ablation: Relation Head Architectures

Three RH variants are evaluated:

Full (full-range attention): Default, allowing all ticker embeddings to attend to all tokens in news text.
Limited: Restricts attention only to mentioned tickers within each article.
Pooling: Uses average-pooled LM representations.

Results unambiguously favor full-range attention, with both full and limited outperforming pooling, and full attention achieving the best macro F1 and MCC. Theoretical analysis supports the empirical findings: restricting ticker scope under-exploits available semantic information, while pooling discards relational specificity.

Evaluation of input-only versus input-plus-generated hidden states demonstrates negligible value from including model generations, with significant computational penalty, justifying the default input-only configuration.

Implications, Limitations, and Future Directions

Relational probing introduces a principled method for deriving semantically grounded, dynamic financial graphs directly from LM hidden states, obviating brittle prompt engineering and manual structure enforcement. Boldly, the authors claim that their method reliably outperforms traditional co-occurrence-derived graphs on downstream FTP metrics. The study also illuminates the diminishing returns of scale for SLMs in this context: smaller models, if equipped with task-aligned adapters, can yield competitive downstream results, opening the way for efficient, practical deployment in latency- or resource-constrained scenarios.

The method, however, does not in itself resolve the stubborn issue of class imbalance in real datasets—a challenge the authors acknowledge and propose to address with calibrated loss formulations. While relational probing demonstrably enhances the quality of relational structure available to the GAT, ultimate gains in predictive performance may require concurrent advances in class-aware training dynamics.

The formalization of SLMs for financial NLP tasks introduces a useful reproducibility benchmark. Prospective research could extend this modularity to alternative task-aligned heads for generalized structured prediction and to further integration of multimodal and event-based data streams for more holistic market modeling.

Conclusion

This work presents a methodologically rigorous, practically effective approach to unifying LM-driven relation induction and graph-based financial prediction. By constraining LM outputs to deliver structured, GNN-ready graphs optimized for downstream predictive performance, relational probing sets a direction for scalable, efficient, and context-adaptive financial modeling. Continued exploration is warranted into more flexible adapter heads, improved handling of data skew, and broader applications beyond finance, wherever text-driven relational induction and graph reasoning intersect.

Markdown Report Issue