Unmasking Hallucinations: A Causal Graph-Attention Perspective on Factual Reliability in Large Language Models

Published 5 Apr 2026 in cs.CL and cs.LG | (2604.04020v1)

Abstract: This paper primarily focuses on the hallucinations caused due to AI LLMs(LLMs).LLMs have shown extraordinary Language understanding and generation capabilities .Still it has major a disadvantage hallucinations which give outputs which are factually incorrect ,misleading or unsupported by input data . These hallucinations cause serious problems in scenarios like medical diagnosis or legal reasoning.Through this work,we propose causal graph attention network (GCAN) framework that reduces hallucinations through interpretation of internal attention flow within a transformer architecture with the help of constructing token level graphs that combine self attention weights and gradient based influence scores.our method quantifies each tokens factual dependency using a new metric called the Causal Contribution Score (CCS). We further introduce a fact-anchored graph reweighting layer that dynamically reduces the influence of hallucination prone nodes during generation. Experiments on standard benchmarks such as TruthfulQA and HotpotQA show a 27.8 percent reduction in hallucination rate and 16.4 percent improvement in factual accuracy over baseline retrieval-augmented generation (RAG) models. This work contributes to the interpretability,robustness, and factual reliability of future LLM architectures.

Abstract PDF Upgrade to Chat

Authors (3)

Summary

The paper introduces a Causal Graph-Attention Network (C-GAN) to model token-level causal dependencies and mitigate hallucinations in LLMs.
It employs causal contribution scores and graph re-weighting that achieve a 27.8% reduction in hallucination rate and a 16.4% increase in factual accuracy over baseline methods.
The methodology offers fine-grained interpretability and robust error analysis, enhancing LLM reliability for safety-critical applications.

A Causal Graph-Attention Approach for Factual Reliability in LLMs

Introduction

"Unmasking Hallucinations: A Causal Graph-Attention Perspective on Factual Reliability in LLMs" (2604.04020) addresses the persistent problem of hallucinations in LLMs, reframing hallucination not as a mere behavioral anomaly but as an artifact of unstable or spurious causal dependencies within transformer architectures. The work situates itself at the intersection of causal inference, graph neural networks (GNNs), and attention analysis, introducing a unified Causal Graph-Attention Network (C-GAN) for both interpretability and mitigation of hallucinations.

Hallucinations: Structural Causes and Existing Mitigations

The paper distinguishes between intrinsic hallucinations, arising from misalignments in learned representations, and extrinsic hallucinations that result from insufficient factual grounding. Prior approaches—specifically RAG and human-feedback reinforcement—have been effective at tempering hallucinations but lack fine-grained, token-level interpretable causality. The authors argue for a causal perspective that fuses attention flow and functional attribution, positing that hallucinations originate in the internal structure of the model rather than post-hoc behaviors.

Methodology: Causal Graph-Attention Network (C-GAN)

Model Architecture

C-GAN models the dependencies between input and output tokens as a directed graph where each node corresponds to a token and edges encode both self-attention weights and gradient-based attribution scores. By treating both attention and attribution as causal signals, the model constructs token-level causal graphs to localize factual support.

Causal Contribution Score (CCS)

The Causal Contribution Score (CCS) is introduced to quantify the strength of factual dependency between input and output tokens. The CCS combines both attention and gradient signals, creating a more direct measurement of which inputs genuinely induce particular outputs. This enables empirical identification of unsupported generation chains within the output sequence.

Graph Re-weighting and Regularization

A fact-anchored graph re-weighting scheme is layered atop the transformer’s attention, employing a GAT that suppresses the influence of nodes (tokens) with low CCS, thereby dynamically redirecting the generative process towards factually-entrenched paths. This is implemented with real-time re-weighting, using information from retrieved external evidence to modulate attention distributions.

Implementation and Experimental Setup

The architectural augmentations are instantiated in PyTorch using the Hugging Face Transformers library, with the core method built atop GPT-2-medium. Empirical studies are conducted on established factuality benchmarks, TruthfulQA and HotpotQA, using causal regularization and cross-entropy loss. The GNN modules utilize dropout and multiple attention heads to promote robust information propagation in the graph.

Results

Quantitative evaluation demonstrates that C-GAN achieves a 27.8% reduction in hallucination rate and a 16.4% absolute increase in factual accuracy relative to RAG baselines. Specifically, hallucination rates dropped from 27.5% (RAG) to 19.7% (C-GAN), and factual accuracy improved from 68.4% to 79.8%. These results underscore the efficacy of explicit token-level causal modeling over conventional retrieval-based or post-hoc filtering approaches.

Importantly, the analysis reveals that hallucinations are not random events; instead, they often originate from upper transformer layers that overgeneralize without sufficient evidence, creating unstable or spurious dependencies.

Implications and Limitations

The causal graph approach provides a principled mechanism for both diagnostics and mitigation. For deployment in safety-critical contexts such as clinical NLP, legal reasoning, and education, the combination of automated hallucination suppression and post-hoc interpretability offers a concrete risk reduction strategy. Furthermore, the token-level visualization capabilities facilitate fine-grained error analysis, enabling iterative model improvement.

Principal limitations include increased computational overhead due to graph extraction and attribution computations, sensitivity to retrieval quality, and challenges in generalizing CCS thresholds across diverse LLM architectures. These constraints presently hinder wide-scale, real-time deployment.

Future Directions

The paper outlines several extensions:

Application of the causal graph framework to multimodal models, addressing hallucinations in vision-language systems.
Optimization of causal layers for deployment in latency-sensitive or resource-constrained environments.
Construction of a standardized benchmark for hallucination detection with causal annotations to advance reproducible interpretability studies.

Conclusion

This work establishes a rigorous, interpretable, and empirically validated approach to hallucination detection and mitigation in LLMs, grounding unsupported output generation in structural causal dependencies. The C-GAN methodology advances both theoretical understanding and practical robustness of autoregressive language modeling and sets the stage for further developments at the interface of GNNs, causal inference, and trustworthy AI.

Markdown Report Issue