AMO-ENE: Attention-based Multi-Omics Fusion Model for Outcome Prediction in Extra Nodal Extension and HPV-associated Oropharyngeal Cancer

Published 10 Apr 2026 in eess.IV and cs.CV | (2604.09280v1)

Abstract: Extranodal extension (ENE) is an emerging prognostic factor in human papillomavirus (HPV)-associated oropharyngeal cancer (OPC), although it is currently omitted as a clinical staging criteria. Recent works have advocated for the inclusion of iENE as a prognostic marker in HPV-positive OPC staging. However, several practical limitations continue to hinder its clinical integration, including inconsistencies in segmentation, low contrast in the periphery of metastatic lymph nodes on CT imaging, and laborious manual annotations. To address these limitations, we propose a fully automated end-to-end pipeline that uses computed tomography (CT) images with clinical data to assess the status of nodal ENE and predict treatment outcomes. Our approach includes a hierarchical 3D semi-supervised segmentation model designed to detect and delineate relevant iENE from radiotherapy planning CT scans. From these segmentations, a set of radiomics and deep features are extracted to train an imaging-detected ENE grading classifier. The predicted ENE status is then evaluated for its prognostic value and compared with existing staging criteria. Furthermore, we integrate these nodal features with primary tumor characteristics in a multimodal, attention-based outcome prediction model, providing a dynamic framework for outcome prediction. Our method is validated in an internal cohort of 397 HPV-positive OPC patients treated with radiation therapy or chemoradiotherapy between 2009 and 2020. For outcome prediction at the 2-year mark, our pipeline surpassed baseline models with 88.2% (4.8) in AUC for metastatic recurrence, 79.2% (7.4) for overall survival, and 78.1% (8.6) for disease-free survival. We also obtain a concordance index of 83.3% (6.5) for metastatic recurrence, 71.3% (8.9) for overall survival, and 70.0% (8.1) for disease-free survival, making it feasible for clinical decision making.

Abstract PDF Upgrade to Chat

Authors (11)

Summary

The paper introduces an automated pipeline that fuses CT-based imaging, radiomics, and deep features for accurate detection and grading of imaging-detected extranodal extension (iENE) in HPV+ OPC.
It employs a SwinUNETRv2 segmentation model and an attention-based fusion architecture, achieving high Dice scores and robust 2-year outcome predictions for distant metastasis and survival.
The model outperforms expert consensus in survival stratification, suggesting its potential to standardize iENE assessment and enhance prognostic accuracy in clinical settings.

AMO-ENE: Attention-based Multi-Omics Fusion for Prognosis in Extranodal Extension and HPV-associated Oropharyngeal Cancer

Introduction and Clinical Significance

Extranodal extension (ENE) is a critical prognostic indicator in oropharyngeal carcinoma (OPC), signifying tumor infiltration beyond lymph node capsules and correlating with aggressive disease and increased risk of metastatic dissemination. Although ENE is recognized histopathologically (pENE) in current oncology guidelines, it is often omitted from staging in HPV-associated OPC due to the lack of reproducible imaging-based standards and the practical absence of neck dissection in chemoradiation-first protocols. Imaging-detected ENE (iENE) offers an avenue for pre-treatment risk stratification but is hampered by inconsistent interpretation, low soft tissue contrast, and labor-intensive annotations.

Responding to these deficiencies, the AMO-ENE pipeline integrates an end-to-end solution: automated iENE segmentation from CT, radiomics/deep feature extraction, grade classification, and multimodal fusion for 2-year oncologic outcome prediction. This platform leverages SwinUNETR-based deep learning segmentation, feature fusion via a novel attention-based architecture, and robust survival modeling, targeting improved risk prediction and reproducibility in the HPV+ OPC cohort.

Figure 1: Illustration for iENE status grading system; grades 0-3 reflect increasing severity of ENE based on capsule penetration and adjacent tissue involvement.

Framework Overview and Data Cohort

The retrospective study included 397 HPV+ OPC patients (predominantly male, median age 62), treated between 2009 and 2020. Each patient contributed pre-treatment CT, gross tumor volume (GTV) and iENE expert segmentations, and comprehensive clinical metadata (e.g., TNM staging, smoking, chemotherapy). Outcomes comprised overall survival (OS), distant metastasis (DM), and disease-free survival (DFS), with median follow-up 44.4 months.

The AMO-ENE workflow is as follows:

Automated segmentation of metastatic nodes/ENE using a 3D SwinUNETRv2-based approach with multi-scale masked autoencoder pre-training.
Radiomics and foundation model (FMCIB) feature extraction from both nodal and primary tumor regions.
Supervised classification of iENE grades using feature selection (Lasso, PCA) and optimized XGBoost classifiers under dichotomized schemes.
Attention-based multi-omics/multimodal fusion for risk prediction, using clinical, nodal, and primary tumor features.
Survival analysis employing both binary outcome prediction (2-year landmarking) and discrete-time survival modeling (MTLR framework).
Figure 2: Overall schematic diagram of the AMO-ENE pipeline, illustrating the modular integration from segmentation through multimodal outcome modeling.

Automated Segmentation of iENE

AMO-ENE implements hierarchical 3D segmentation of nodal structures using a hybrid CNN-Transformer model (SwinUNETRv2), augmented with masked autoencoder pre-training for improved data efficiency and generalizability. Ablation with nnU-Net, SwinUNETR, and SamMed3D demonstrated that the AMO-ENE pipeline provided the best Dice similarity coefficient (mean Dice 78.4 ± 7.5), and further refinement by component selection raised this to 83.5 ± 4.1. Performance was maximal for grade 2 iENE (DSC 83.4 ± 6.4) and lower for the most advanced disease (grade 3, DSC 70.2 ± 4.5), reflecting increasing annotation ambiguity for extensive soft tissue invasion.

Notably, the generalist prompt-based SamMed3D foundation model substantially underperformed (DSC 39.4–46.4), suggesting that domain adaptation and specialized training remain critical for small/complex structures in oncologic head and neck imaging.

Figure 3: Qualitative results for the iENE segmentation task, contrasting the AMO-ENE model, nnUNet, SwinUNETRv2, SamMed3D, and ground truth for two test patients.

Feature Extraction, iENE Grading, and Interpretability

Radiomics (PyRadiomics; 100 features per lesion) and deep features (FMCIB; 4096 per lesion) were extracted from predicted masks. iENE grade classification, posed as a dichotomized or multi-class problem, achieved optimal AUC (81.7 ± 5.7) for general iENE detection and 89.9 ± 5.0 for high-grade (grade 3) ENE using XGBoost with Lasso feature selection and combined radiomics/deep features. Deep features alone underperformed in isolation (AUC <72), but enhanced discriminative power when fused with radiomics.

SHAP analyses identified texture non-uniformity and shape descriptors (e.g., elongation) as most predictive for ENE, confirming that imaging heterogeneity and morphologic disruption are strongly linked to pathologic grade.

Figure 4: ROC curves for binary classification of dichotomized iENE classes, evaluated with XGBoost and radiomics/foundation features.

Figure 5: SHAP plots for feature importance in iENE− vs iENE+ classification, highlighting the dominance of texture-based radiomics.

AMO-ENE’s core innovation is its attention-based multimodal fusion architecture. Imaging and non-imaging features are projected by modality-specific neural encoders into a joint latent space, concatenated, and fused using multi-head scaled dot product attention. The output supports both direct binary classification (2-year outcomes) and time-dependent risk modeling (MTLR).

Figure 6: Schematic of the AMO-ENE modality fusion pipeline, showing parallel encoders, concatenated latent features, and attention-based fusion culminating in adaptable outcome heads.

Ablation experiments confirmed that attention-based fusion substantially outperforms early or simple late concatenation, achieving a recall of 75.4% and specificity of 91.2% for DM at 2 years (AUC 88.2±4.8). Performance peaked with eight attention heads, and further increases did not yield additional gains, emphasizing the interplay between model complexity and overfitting.

Modality ablation revealed that iENE nodal features drive DM prediction (AUC 83.1% alone, up to 88.2% combined), while primary tumor radiomics most effectively predicted OS. Clinical features alone underperformed in all tasks.

Figure 7: Modality ablation experiment results at 2 years, comparing combinations of clinical, primary (GTV), and nodal (iENE) features for DM, DFS, and OS prediction.

Survival Modeling and Statistical Validation

Binary prediction at the 2-year endpoint using AMO-ENE achieved AUCs of 88.2% for DM, 79.2% for OS, and 78.1% for DFS. These results surpass established radiomics, logistic regression, and bagged random forest benchmarks (all AUCs ≤80.5%).

For time-to-event survival, the multi-bin MTLR integration achieved C-indices of 83.3% (DM), 70.0% (DFS), and 71.3% (OS), outperforming Cox and bagged risk estimation baselines. Notably, the gain in DM prediction demonstrates that imaging-based iENE features encode critical temporal information for metastatic progression.

Clinical Validation and Human Benchmarking

Kaplan-Meier analysis stratified by model-predicted iENE grade showed statistically significant differences for all outcomes, reinforcing the iENE's value as an imaging biomarker for risk (log-rank p < 0.05).

For a 55-patient subset with triple expert annotations, AMO-ENE predictions yielded lower log-rank p-values (stronger group discrimination) than consensus or individual annotators. Bootstrapped simulated single-reader performance consistently underperformed model prediction, supporting the hypothesis that standardized, automated iENE grading outperforms manual, subjective interpretation and reduces inter-observer variability.

Figure 8: Kaplan-Meier survival curves stratified by ground-truth ENE and predicted iENE status, showing robust separation for DM, DFS, and OS.

Discussion and Implications

AMO-ENE establishes a comprehensive pipeline for automated iENE assessment and outcome prediction, achieving strong segmentation and classification accuracy, interpretability via radiomics/deep features and SHAP, and outperforming consensus expert stratification in survival prediction. The attention-based fusion strategy robustly integrates multi-omics data, enabling independent contributions of nodal and primary tumor radiomics in a scalable, explainable manner.

The claim that model-predicted iENE status significantly outstrips human rater consensus in outcome discrimination is corroborated by multi-fold cross-validation and statistical benchmarking. The approach demonstrates substantial improvements in both binary (landmark) and continuous (survival) endpoint prediction compared to state-of-the-art machine learning and classical radiomics approaches.

Limitations include single-center data origin, sex/race imbalance, and lack of external validation, which may restrict real-world generalizability. Annotation variability and limited representation of advanced disease (grade 3) may influence segmentation/classification ceiling performance.

Future Directions

The AMO-ENE paradigm can be extrapolated to other multimodal oncology tasks where the integration of imaging, clinical, and molecular features is essential. Further research may involve:

Cross-institutional validation to assess generalizability and robustness under heterogenous imaging protocols and population differences.
Integration of additional omics (e.g., genomics, pathology, proteomics) for even deeper fusion architectures.
Prospective trials to evaluate clinical decision-making impact, not only as a staging adjunct but also in guiding therapy de-escalation/intensification.
Deployment in community centers to benchmark the reduction in inter-physician variability and its influence on oncologic outcomes.

Conclusion

AMO-ENE provides an automated, reproducible, and interpretable multi-omics fusion pipeline for imaging-based ENE assessment and prognostication in HPV-associated OPC. By outperforming expert annotation and baseline machine learning methods in both segmentation and outcome prediction tasks, this approach advocates for the inclusion of standardized iENE grading in future staging systems and sets a precedent for advanced modality fusion in clinical AI applications (2604.09280).

Markdown Report Issue