- The paper presents a Saliency-Guided Prompt Distillation (SPD) framework that refines noisy clinical prompts by leveraging anatomical priors.
- It employs a two-stage process combining Anatomical Prior Learning and Contextual Prompt Distillation to enhance segmentation accuracy, improving metrics like Dice by 14.2%.
- Empirical results on diverse MRI and CT datasets demonstrate SPD’s superiority over existing methods, ensuring robust segmentation despite prompt noise.
Saliency-Guided Prompt Distillation for Robust Medical Image Segmentation with SAM
Motivation and Problem Definition
Robust segmentation of anatomical structures is fundamental for clinical assessment and intervention planning, yet the clinical reality is that precise prompts—such as pixel-perfect points and boxes required by prompt-adapted segmentation foundation models—are rarely available. The Segment Anything Model (SAM) achieves state-of-the-art zero-shot segmentation results with high-quality prompts on natural images, but it fails in clinical workflows where annotation is noisy, coarse, or ambiguous. Centerline points provided by radiologists often follow broad anatomical paths, drifting between neighboring structures and only partially covering the region of interest. This prompt noise degrades SAM's adaptation and limits its reliability in medical deployment.

Figure 1: Examples of noisy clinical prompts—centerline annotations (red points) and ground-truth masks (green) for abdominal MRI—highlighting regions where ambiguous prompts mislead SAM away from ideal segmentation.
Methodology: Saliency-Guided Prompt Distillation Framework
The proposed Saliency-Guided Prompt Distillation (SPD) framework addresses the bottleneck imposed by noisy prompts, emulating expert reasoning through two stages: Anatomical Prior Learning and Prompt-Guided Segmentation.
Effectiveness of Consensus Prompts and Cross-Slice Enrichment
SPD’s CPD module performs local prompt validation followed by context-aware enrichment using dual saliency map validation. Prompts from neighboring slices, if anatomically consistent, are propagated to the current slice. This process consolidates sparse, noisy inputs into a consensus prompt set providing reliable guidance for SAM.
Figure 3: Visualization of consensus prompt integration—prompts from neighboring slices (yellow crosses) augment sparse local cues (red dots), closely aligning with ground truth regions (green masks).
Empirical evaluation shows that consensus prompts far outperform both randomly sampled noisy points and full original centerline sets. On the TI dataset, consensus prompts alone increase Dice by 14.2% and IoU by 13.6% in zero-shot frozen SAM inference.
Figure 4: Comparative zero-shot segmentation accuracy using various prompt sources—consensus prompts yield strong improvements over all baselines, including full noisy annotation sets.
SPD is benchmarked on four medical imaging datasets spanning MRI and CT, including scenarios with both real and synthetically noisy prompts. It demonstrates consistent superiority over both traditional supervised segmentation models (UNet, TransUNet, nnUNet) and recent SAM-based adaptations (SAM-Tuning, MedSAM, MSA, SAM-Refiner). For example, on the TI dataset—which features genuine clinical prompt noise—SPD achieves 73.58% DSC, surpassing the best baseline (SAM-Tuning) by +11.08%, while reducing HD95 by 6.28. Improvements are similarly strong on Scar, FUMPE, and KiTS datasets.
Qualitative analysis highlights SPD’s impact: traditional methods under-segment or mis-segment due to annotation scarcity, while SAM-based models are misled by noisy prompts into erroneous regions. SPD’s distilled guidance produces segmentations that are both more accurate and anatomically faithful.
Figure 5: Qualitative comparison demonstrating SPD's robust segmentation in challenging views and diverse datasets relative to competing methods.
Beyond 2D slices, SPD outperforms recent 3D SAM-based approaches (SAM2, MedSAM2) on volumetric metrics such as Average Dice and Volumetric Dice, underscoring SPD’s superior prompt robustness and spatial coherence across volume data.
Figure 6: Performance comparison with 3D SAM-based baselines on the TI dataset, showing consistent superiority in both region and volumetric overlap metrics.
Ablation Studies and Hyperparameter Robustness
SPD’s critical components are validated through ablation:
- Local prompt validation enhances DSC by +4.45% over baseline.
- CPD (without PSC) provides an additional +3.37% boost, confirming the value of contextual refinement.
- PSC further improves DSC (to 73.58%), adding boundary precision and segmentation stability.
Analysis of PSC reveals that bi-directional spatial coherence does not translate to better results compared to uni-directional enforcement, likely due to conflicting anatomical context. SPD is demonstrated to be robust to hyperparameter choices, including saliency threshold and contextual slice number.
Figure 7: Impact of guidance source in PSC—uni-directional slice consistency yields optimal segmentation; bi-directional enforcement introduces marginal degradation.
Figure 8: Ablation on the number of contextual slices in CPD—performance improves with moderate context but does not degrade with higher slice inclusion.
SPD’s saliency map is highly confident and discriminative, with background dominating the prediction space, minimizing sensitivity to the saliency threshold and simplifying deployment.
Figure 9: Distribution of predicted saliency values reveals a clear background/foreground separation, indicating robustness to threshold selection.
Practical and Theoretical Implications
SPD presents a practical solution for foundation model deployment in medical environments typified by prompt ambiguity and annotation scarcity. By emulating expert spatial reasoning and leveraging contextual anatomical priors, SPD achieves both prompt robustness and segmentation accuracy, bridging the gap between research-grade models and clinical applicability. The generalizable architecture supports integration with diverse promptable foundation models and can catalyze future research into cross-slice reasoning, uncertainty-aware distillation, and scalable medical model adaptation.
Conclusion
Saliency-Guided Prompt Distillation (SPD) enables robust adaptation of foundation models like SAM to medical image segmentation in the presence of inherently noisy prompts. Through anatomical prior learning, contextual prompt distillation, and spatial slice consistency, SPD achieves consistent improvement over state-of-the-art methods across several modalities and datasets. The effective handling of prompt ambiguity is pivotal for real-world clinical model deployment, and SPD sets a principled foundation for future advances in robust prompt-guided segmentation.