Machine Behavior in Relational Moral Dilemmas: Moral Rightness, Predicted Human Behavior, and Model Decisions

Published 23 Apr 2026 in cs.CL | (2604.21871v1)

Abstract: Human moral judgment is context-dependent and modulated by interpersonal relationships. As LLMs increasingly function as decision-support systems, determining whether they encode these social nuances is critical. We characterize machine behavior using the Whistleblower's Dilemma by varying two experimental dimensions: crime severity and relational closeness. Our study evaluates three distinct perspectives: (1) moral rightness (prescriptive norms), (2) predicted human behavior (descriptive social expectations), and (3) autonomous model decision-making. By analyzing the reasoning processes, we identify a clear cross-perspective divergence: while moral rightness remains consistently fairness-oriented, predicted human behavior shifts significantly toward loyalty as relational closeness increases. Crucially, model decisions align with moral rightness judgments rather than their own behavioral predictions. This inconsistency suggests that LLM decision-making prioritizes rigid, prescriptive rules over the social sensitivity present in their internal world-modeling, which poses a gap that may lead to significant misalignments in real-world deployments.

Abstract PDF Upgrade to Chat

Authors (6)

Summary

The paper demonstrates that LLMs show a clear divergence between normative moral rightness and predicted human behavior in relational dilemmas.
It employs a rigorous experimental design based on the Whistleblower’s Dilemma, testing 1,296 prompts across six modern LLMs to evaluate contextual sensitivity in moral decisions.
Findings reveal that while LLMs’ model decisions align with prescriptive norms, they underrepresent human relational nuances, urging refined alignment protocols for ethical AI.

Machine Behavior in Relational Moral Dilemmas: An Expert Synthesis

Problem Formulation and Framework

The paper provides a systematic investigation of moral reasoning in LLMs by adapting the Whistleblower’s Dilemma, varying scenario parameters along two critical dimensions: crime severity and relational closeness. The study centers on three distinct evaluative perspectives: (1) moral rightness (prescriptive normativity), (2) predicted human behavior (descriptive social expectation), and (3) model decision (machine-intentional action). This multi-perspective formalism elucidates how LLMs rationalize across contextual and relational variables, leveraging 1,296 prompt instances distributed over six modern LLMs.

Figure 1: Visualization of the Whistleblower’s Dilemma and the three perspectives, illustrating how LLM outputs are contingent on evaluative framing.

Methodology

Each model is exposed to systematically generated scenarios modulating both the severity of wrongdoing and relational proximity (stranger, acquaintance, friend, family), with each prompt soliciting responses from one of the three evaluative perspectives. Decision outputs (binary reporting or not) are paired with rationales, which are subsequently decomposed lexically using the Moral Foundations Dictionary (MFD) mapped onto the principal axes of Moral Foundations Theory (MFT): Fairness, Loyalty, Authority, Care, with Sanctity being rarely present. The reporting ratio, $P_{\mathrm{report}(c)}$ , is computed per contextual condition and perspective. For empirical validation, a subset is cross-annotated by human raters for both moral foundation attribution and reporting decisions.

Results

Contextual Sensitivity

Across all LLMs, reporting ratios escalate with increasing crime severity and diminish as relational closeness intensifies. This behavior reflects known gradients in human social psychology, where obligations to group or kin counterbalance impartial moral norms, particularly as proximity increases.

Figure 2: Heatmaps of reporting ratios for varying severity and relational closeness, segmented by perspective; demonstrates the relational attenuation of reporting.

Cross-Perspective Divergence

A substantial divergence is observed across perspectives. In the ‘moral rightness’ frame, nearly all models exhibit high and stable reporting ratios (median $>0.9$ ), insensitive to relational proximity. Conversely, when tasked with predicting human behavior, reporting ratios decline sharply and display wider variance, with models recognizing the contextual salience of Loyalty—especially for closer relationships and lesser crimes.

Strikingly, the model decision perspective aligns closely with moral rightness, not with predicted human behavior, revealing an intra-model value-action inconsistency. This indicates an over-representation of prescriptive normativity in models’ own decisions, irrespective of their internal recognition of human relational sensitivity.

Figure 3: Distributions of reporting ratios across models and perspectives, illustrating the right-skew under moral rightness and the broader, lower distribution under predicted human behavior.

Moral Foundation Dynamics

Analysis of model-generated rationales shows that prescriptive perspectives (moral rightness, model decision) are dominated by Fairness and Care, with Loyalty minimally expressed and largely invariant to context. By contrast, predicted human behavior outputs accentuate Loyalty as relational closeness increases and Fairness as severity increases—aligning with empirical studies on human judgment under the fairness–loyalty tradeoff.

Figure 4: Radar plots summarizing mean moral foundation ratios for each perspective per model; Loyalty peaks in predicted human behavior, Fairness dominates moral rightness.

Further regression analyses quantify the responsiveness of each foundation. Incremental relational closeness yields significant positive slopes for Loyalty (notably under predicted human behavior), while growing severity shifts emphasis toward Fairness at the expense of Loyalty.

Figure 5: OLS coefficient estimates charting shifts in moral foundations given contextual changes; Loyalty’s responsiveness to closeness is highest for predicted human behavior.

Comparisons between 'Report' and 'Not report' rationales confirm that Fairness and Authority are salient for reporting, while Loyalty is associated with non-reporting behavior.

Figure 6: Bar plots of mean moral ratio by decision (Report vs. Not report) and perspective, showing Loyalty’s negative association with reporting, especially in predicted human behavior.

Model-Human Alignment

Model predictions of human behavior correlate strongly with human annotations (Spearman $>0.90$ , Pearson $>0.74$ , mean MAE of 0.227). Notably, loyalty exhibits strong positive association with human decisions while care is negatively correlated, implicating nuanced differences in foundation weighting between LLMs and humans.

Effect of Post-Training and Instructional Regimes

Controlled comparison of instruction-tuned and reasoning-tuned variants (e.g., OLMo 3/3.1) demonstrates that post-training regimes modulate the perspective gap—with instruction tuning, in some instances, inducing greater alignment with prescribed norms and reducing the model's propensity to align with behavioral predictions.

Implications

Theoretical Implications

The study establishes that LLMs’ ethical judgments are not monolithic; rather, their outputs are highly conditioned on task framing and scenario context. While models have internalized the social interplay of fairness and loyalty, their default policies tend to rigidly prioritize prescriptive universals (impartiality, legality) over relational sensitivity. This structure reifies normative ethical frameworks prevalent in model pretraining corpora and post-training alignment, but diverges from pluralistic, circumstance-dependent human moral reasoning.

Cultural considerations are foregrounded: down-weighting Loyalty may yield systems that are normatively consistent but culturally parochial—particularly in settings where viśeṣa-dharma, role, or relationship-specific duties are salient. This points to the potential necessity of incorporating culturally relative normativity into future alignment protocols.

Practical and Future Directions

The identified perspective gap presents real risks for LLM deployment in decision-support, clinical, or advisory contexts; models could deliver advice perceived as insensitive, mechanical, or detached from social realities. A multi-perspective evaluation is thus essential for both benchmarking and steering model behavior in deployment-relevant applications.

The authors’ methodology provides actionable diagnostic tools for alignment and steerability: by extracting shifts in foundation emphasis according to prompt framing, system designers can adjust model objectives or fine-tuning signals to better match desired normative reference groups or task settings.

Mechanistic interpretability remains a crucial outstanding challenge; the paper suggests coupling MFT-grounded behavioral analysis with causal tracers and representation analysis to elucidate the latent features mediating context sensitivity in LLMs.

Conclusion

This study provides robust evidence that LLMs exhibit context-dependent, frame-contingent moral behavior, with a structural bias toward prescriptive fairness that dominates their machine decisions—even as they internally model human relational sensitivity. The observed value-action divergence is systematic and accentuated by alignment and post-training dynamics.

Without accounting for this, LLM-generated advice may be normatively idealized but socially misaligned, particularly in cross-cultural or high-stakes applications. The results argue for more sophisticated, context-aware, and multi-perspective frameworks in both the evaluation and alignment of socially deployed AI systems. Future work should extend this paradigm to richer sets of moral dilemmas, broader cultural domains, and mechanistic interpretability tools.

Markdown Report Issue