A comparative analysis of machine learning models in SHAP analysis

Published 8 Apr 2026 in cs.LG | (2604.07258v1)

Abstract: In this growing age of data and technology, large black-box models are becoming the norm due to their ability to handle vast amounts of data and learn incredibly complex data patterns. The deficiency of these methods, however, is their inability to explain the prediction process, making them untrustworthy and their use precarious in high-stakes situations. SHapley Additive exPlanations (SHAP) analysis is an explainable AI method growing in popularity for its ability to explain model predictions in terms of the original features. For each sample and feature in the data set, an associated SHAP value quantifies the contribution of that feature to the prediction of that sample. Analysis of these SHAP values provides valuable insight into the model's decision-making process, which can be leveraged to create data-driven solutions. The interpretation of these SHAP values, however, is model-dependent, so there does not exist a universal analysis procedure. To aid in these efforts, we present a detailed investigation of SHAP analysis across various machine learning models and data sets. In uncovering the details and nuance behind SHAP analysis, we hope to empower analysts in this less-explored territory. We also present a novel generalization of the waterfall plot to the multi-classification problem.

Abstract PDF Upgrade to Chat

Authors (2)

Summary

The paper introduces a high-dimensional waterfall plot to generalize SHAP explanations for multiclass problems.
It reveals that model-specific SHAP values generate distinct clustering patterns, with XGBoost and neural networks achieving higher accuracy than decision trees.
The study demonstrates practical insights for personalized clinical interventions and sets the stage for refining explainable AI methods.

Comparative Analysis of Machine Learning Models in SHAP Analysis

Overview

The paper "A comparative analysis of machine learning models in SHAP analysis" (2604.07258) presents a systematic investigation into how SHapley Additive exPlanations (SHAP) values interface with different machine learning models and datasets. The analysis covers decision trees, XGBoost, feedforward neural networks (including convolutional variants), and their SHAP-based interpretability profiles, employing both simulated and real-world data. A notable contribution is the introduction of the high-dimensional waterfall plot to generalize SHAP explanations for multi-classification problems.

SHAP Methodology and Model Dependencies

SHAP, grounded in cooperative game theory, quantifies feature contributions to predictions on a per-sample basis. The paper highlights the mathematical formalism: for sample $x$ , SHAP values $\phi(f;x)_i$ satisfy $\sum_{i=1}^p \phi(f;x)_i = f(x) - \mathbb{E}[f(X')]$ , encapsulating how deviations from baseline predictions are distributed among features.

The practical computation of SHAP varies with model type. TreeSHAP is leveraged for trees and XGBoost, exploiting hierarchical structure for computational efficiency; neural networks, lacking such structure, employ model-agnostic sampling methods (Kernel SHAP). This model dependency results in divergent interpretability, clustering tendencies, and subgroup discovery capabilities, explored in depth across test cases.

Simulation Studies: Subgroup Discovery and Model Clustering

The simulated experiment intentionally embeds heterogeneity in class assignment, enabling SHAP to uncover distinct subpopulation pathways. Three classifiers are examined:

Decision Trees: SHAP values exhibit discrete, sparse clustering, aligning tightly with leaf node assignments and reflecting binary partitioning.
XGBoost: Produces more coherent and contiguous clusters, intermediate between decision trees and neural networks, well-suited for subgroup analysis.
Neural Networks: SHAP values form smoother, less discrete clusters, evidencing maximal model flexibility due to universal approximation properties (Augustine, 2024).

All models robustly detect the intended two distinct pathways to the same class, with XGBoost and neural network models achieving higher accuracy (up to 0.96 vs. 0.89 for trees). SHAP-based HDBSCAN clustering corroborates subgroup validity across models. The high-dimensional clustered waterfall plot, novel in this paper, provides a granular visualization of feature-by-feature contributions for clusters, enabling subgroup interpretation beyond what is feasible with UMAP embeddings.

Image Classification (MNIST): SHAP in Structured Data

MNIST serves as a challenging test for SHAP interpretability with spatially correlated features. The results demonstrate:

Decision Trees: Poor precision/recall (accuracy 0.80), SHAP clustering is misaligned with digit structure.
XGBoost: Enhanced clustering, accuracy 0.96, SHAP values better align with digits but still reflect pixel-wise decisions.
CNNs: Highest accuracy (0.98), but SHAP values fail to reflect digit structure cleanly. The additive nature of SHAP is incompatible with spatial invariance and convolutional filters, as digit-defining strokes are captured by shifting pixel sets, defeating direct feature attribution.

The clustered waterfall plot for CNNs nonetheless exposes similarities and confusions between digits (e.g., 5 with 2/3/6, 7 with 8/9), mapping the limitations of additive SHAP explanations in image models.

ADNI Data: SHAP for Clinical Subgrouping

In medical applications, interpretability is critical. Models are trained on ADNI data to classify patients (CN, MCI, AD):

All three models achieve strong accuracy (0.91–0.93).
SHAP values consistently highlight CDRSB as most influential, with other features diverging by model.
UMAP embedding and HDBSCAN clustering on SHAP values reveal both within-class and cross-class heterogeneity, identifying distinct cognitive impairment subgroups.

Analysis of SHAP values for clusters corresponding to AD patients reveals feature-level explanations with direct clinical implications: CDRSB and MMSE differentiate clusters, suggesting targeted intervention strategies. This highlights the utility of SHAP-based clustering for data-driven, personalized solutions in medicine.

Theoretical and Practical Implications

The paper establishes that SHAP analysis, when used conscientiously, exposes model-specific decision rationales and subgroup structures unavailable through raw feature clustering. The high-dimensional clustered waterfall plot enhances interpretability in multiclass scenarios with complex data.

Tree-based models: Efficient, discrete SHAP clustering aligns with inherent architectural properties, suitable for subgroup discovery but may oversimplify complex relationships.
XGBoost: Balances interpretability and flexibility; SHAP values are less discrete, supporting nuanced clustering and subgroup identification.
Neural networks (including CNNs): Maximum modeling power, but SHAP interpretation is hindered by non-additive, spatially variant structure, leading to potential misalignment in feature attribution for highly structured data like images.

The limitations of SHAP with CNNs underscore the need for new or adapted explainability methods that can accommodate spatial invariance and convolutional structure, a fertile area for future research.

Future Directions

This investigation suggests several future research avenues:

Developing SHAP-like methods compatible with convolutional and transformer architectures, possibly through attribution paths or spatial aggregation.
Integrating SHAP-based clustering with causal inference to validate subgroup causality.
Extending high-dimensional waterfall plots to temporal or sequential data.
Exploring the impact of model regularization and data augmentation on SHAP clustering patterns.

Conclusion

The comparative analysis provided in this paper demonstrates the substantial model dependence embedded in SHAP explanations, from discrete clustering in trees to nuanced, flexible patterns in neural networks. The novel high-dimensional waterfall plot enhances multiclass interpretability and subgroup discovery, especially in clinical and structured-data settings. Practically, SHAP-based subgrouping supports personalized interventions and data-driven policy. Theoretically, the results motivate the ongoing refinement of explainable AI methods to address limitations encountered in spatially complex models, ensuring robust, meaningful interpretability for increasingly sophisticated ML architectures.

Markdown Report Issue