- The paper introduces STRIKE, a novel stacking framework that leverages semantic feature grouping to enhance credit default prediction.
- It employs a low-capacity additive meta-learner to merge base model predictions, outperforming traditional monolithic approaches by significant AUC margins.
- Empirical evaluations on datasets like Polish Bankruptcy, LendingClub, and HomeCredit confirm superior robustness, transparency, and predictive accuracy.
Additive Feature-Group-Aware Stacking for Credit Default Prediction: The STRIKE Framework
Introduction
Credit risk modeling confronts high-dimensional, heterogeneous, and noisy tabular data, especially in contemporary financial applications. Prevailing machine learning approaches, including tree-based ensembles and neural methods, have achieved notable empirical results, but struggle with issues of interpretability, model robustness under distributional shifts, and susceptibility to feature interference. This is exacerbated by the conglomeration of features from disparate origins—demographics, bureau records, delinquencies, and vintage signals—within monolithic models. The "STRIKE: Additive Feature-Group-Aware Stacking Framework for Credit Default Prediction" (2604.17622) introduces a refined approach that explicitly utilizes domain semantics through feature-grouped stacking, offering improved predictive performance and transparency critical for both operational and regulatory contexts in credit risk analysis.
Methodological Framework
Structured Feature Grouping
The STRIKE methodology decomposes the feature space into semantically coherent groups (e.g., Demographics, Vintage, Delinquency). This partition can be defined via domain knowledge or constructed through automated clustering (e.g., mutual information or correlation-based). Each group is modeled by diverse base learners (XGBoost, LightGBM, Random Forest, Logistic Regression), leveraging stratified K-fold cross-validation to generate unbiased out-of-fold (OOF) predictions. The explicit decomposition is theoretically motivated by an additive log-odds perspective on binary classification, wherein the predictive evidence from each group is approximately conditionally independent given the outcome.
OOF predictions from the top-performing base models in each group are concatenated to form the meta-dataset. By employing a low-capacity additive meta-learner (default: logistic regression), STRIKE fuses groupwise signals while inherently mitigating overfitting. The aggregation reflects the weighted sum of group-specific logit estimates, adhering to the conditional independence approximation, but retaining the flexibility to absorb residual cross-group dependencies as justified by the data.
Empirical and Theoretical Justification
The theoretical basis for STRIKE is the additive decomposition of the Bayes-optimal logit under groupwise conditional independence. Empirically, conditional mutual information between groups (given the label) is shown to be low, substantiating the practical validity of the modeling assumption for real-world credit datasets. The induction of additivity acts as a beneficial bias: STRIKE avoids spurious interactions prevalent in high-dimensional, noisy tabular data, and delivers improved robustness compared to monolithic or interaction-heavy models.
Experimental Evaluation
Benchmarks and Datasets
STRIKE is evaluated on three benchmark datasets representing diverse credit risk scenarios:
- Polish Bankruptcy: Corporate bankruptcy prediction, high class imbalance.
- LendingClub: Peer-to-peer lending risk, moderate imbalance and heterogeneous features.
- HomeCredit: Large-scale consumer credit data, extreme imbalance, significant sparsity and noise.
Uniform preprocessing is applied, with each dataset’s features partitioned according to meaningful financial groupings.
Comparative Results
STRIKE consistently yields superior AUC-ROC performance across all datasets, outperforming strong baselines, including XGBoost, LightGBM, GBDT, and deep neural architectures like DeepFM, DCN-V2, TabNet, and SR1D-CNN. Notably, STRIKE exceeds SR1D-CNN by over 15 percentage points on the Polish dataset and delivers 0.7661 AUC-ROC on HomeCredit, well above traditional and deep learning benchmarks. These results are obtained without hyperparameter tuning, emphasizing the framework’s architectural advantage rather than reliance on model fine-tuning.
Orthodox stacking methods that aggregate models trained on the full feature space are outperformed by a ∼3.4% AUC improvement, corroborating the efficacy of the explicit feature group decomposition strategy.
Ablation and Sensitivity Analyses
Structured feature grouping is shown to be robust: manually defined or unsupervised (mutual information, correlation-based) groupings yield similar results, whereas arbitrary grouping (random) degrades performance below that of monolithic learners. This attests that STRIKE's gains derive from capturing coherent signal structure, not merely from increased ensembling or model complexity.
Alternative meta-learners (GAM, EBM) provide modest improvements in AUC, but logistic regression meta-learning suffices to surpass all monolithic baselines, indicating that additive aggregation is particularly well matched to the credit risk context.
Implications and Future Directions
The STRIKE framework targets major practical and theoretical requirements in credit modeling: robustness to noise and feature redundancy, scalability to high-dimensionality, and transparent attribution of predictive signals. By structurally isolating semantically coherent sources of information and leveraging targeted base models, STRIKE addresses key failure modes of conventional tabular deep learning and ensemble aggregation—especially those highlighted by recent literature criticizing the deployment of convolutional architectures on non-grid, non-spatial tabular data.
From a regulatory perspective, STRIKE’s modular predictions facilitate traceability and post-hoc analysis, meeting expectations for model transparency in compliance settings (e.g., Basel III). For operational credit systems, improved accuracy and interpretability translate into tangible risk mitigation in lending decisions and portfolio management.
The methodology of feature-grouped stacking introduced here is broadly applicable and suggests further avenues for exploration:
- Automated, data-driven group discovery that is adaptive to evolving data distributions.
- Integration of sparse interaction models at the meta-learning stage to better capture cross-group dependencies while preserving interpretability.
- Extension to other domains characterized by heterogeneous, high-dimensional tabular data (e.g., healthcare risk, fraud detection).
Conclusion
STRIKE constitutes a significant advancement in credit default prediction for structured data, introducing a feature-group-aware additive stacking framework with robust empirical performance and strong theoretical motivation (2604.17622). By reconciling modular specialization and controlled aggregation, STRIKE establishes a compelling methodological template for tackling predictive challenges inherent in noisy, heterogeneous tabular domains.