Foundation Model-guided Iteratively Prompting and Pseudo-Labeling for Partially Labeled Medical Image Segmentation

Published 1 Apr 2026 in cs.CV | (2604.01038v1)

Abstract: Automated medical image segmentation has achieved remarkable progress with fully labeled data. However, site-specific clinical priorities and the high cost of manual annotation often yield scans with only a subset of organs labeled, leading to the partially labeled problem that degrades performance. To address this issue, we propose IPnP, an Iteratively Prompting and Pseudo-labeling framework, for partially labeled medical image segmentation. IPnP iteratively generates and refines pseudo-labels for unlabeled organs through collaboration between a trainable segmentation network (specialist) and a frozen foundation model (generalist), progressively recovering full-organ supervision. On the public dataset AMOS with the simulated partial-label setting, IPnP consistently improves segmentation performance over prior methods and approaches the performance of the fully labeled reference. We further evaluate on a private, partially labeled dataset of 210 head-and-neck cancer patients and demonstrate our effectiveness in real-world clinical settings.

Abstract PDF Upgrade to Chat

Authors (5)

Summary

The paper introduces the IPnP framework, which employs a collaborative specialist and foundation model to recover full supervision from partially labeled data.
It leverages iterative prompting and rigorous pseudo-label filtering, including voxel-level selection loss, to minimize noise and improve anatomical recovery.
Empirical results on abdominal and head-and-neck datasets show that IPnP achieves near fully supervised performance even with only partial annotations.

Foundation Model-Guided Iterative Prompting and Pseudo-Labeling for Partially Labeled Medical Image Segmentation

Problem Statement and Motivation

Automated medical image segmentation has advanced using fully labeled datasets, yet real clinical workflows yield scans in which only a subset of structures are annotated due to annotation burden and variable clinical focus. This partial-label scenario restricts effective supervision, causing degraded model performance and incomplete anatomical learning. Existing remedies include ignoring unlabeled regions or generating pseudo-labels from predictions, but these are limited by the propagation of noise and the lack of reliable guidance for unlabeled organs.

Figure 1: Illustration of partial-label challenges and comparative segmentation outputs; (a) shows the full reference segmentation, (b) the partial supervision during training, (c) standard nnU-Net output, and (d) the proposed IPnP result.

Methodological Framework: IPnP

The Iterative Prompting and Pseudo-labeling (IPnP) framework addresses partial-label segmentation by synergetically leveraging a trainable specialist segmentation network and a frozen foundation model generalist with promptable zero-shot capability. The core process unfolds in four tightly-coupled stages: initial specialist training, specialist-guided prompt generation, foundation model pseudo-labeling with rigorous voxel-wise filtering, and iterative re-training with progressively refined supervision. This generalist–specialist interaction empowers the system to quasi-recover full supervision from partially labeled data, thus enhancing anatomical coverage and mitigating error propagation typical to conventional pseudo-labeling.

Figure 2: IPnP system overview: iterative interaction between the specialist and generalist, with pseudo-label generation, refinement, and re-incorporation into training.

Specialist–Generalist Collaboration and Prompt Design

Following initial training on available labels, the specialist produces coarse predictions. High-confidence organ regions—even for unlabeled classes—are localized through bounding box extraction using median-plane analysis. These 2D boxes from orthogonal views are provided as prompts to the foundation model, enabling spatially precise and context-sensitive pseudo-labeling for previously unlabeled organs.

A multi-constraint scheme ensures pseudo-label reliability:

Class probability threshold filters low-confidence voxels using a softmax margin.
ROI constraint ensures spatial congruence by constraining voxels within 3D regions derived from the dual-view box prompts.
Entropy constraint conditions pseudo-label updates on a monotonic reduction in prediction uncertainty, as measured by voxel-wise entropy, thus prioritizing stability over iterations.
Figure 3: Progressive refinement of pseudo-labels generated at different training epochs, reflecting increased organ delineation accuracy and structural recovery through the iterative process.

Voxel-Level Selection Loss (VLS)

Training incorporates a VLS loss, selectively excluding unreliable voxels—based on consensus between prediction and pseudo-supervision—for unlabeled targets, while maintaining standard supervision for labeled ones. This minimizes the impact of noisy pseudo-labels during gradient updates, as shown to be critical by comparative ablation.

Empirical Evaluation

Simulated and real-world scenarios validate IPnP using the AMOS abdominal multi-organ dataset and a private head-and-neck cancer patient cohort with highly incomplete annotation. Reference comparisons encompass full supervision (upper bound), two partial supervision baselines, and the recent TransDoDNet.

Results consistently demonstrate IPnP outperforming all partial-label baselines on Dice Similarity Coefficient (DSC) and Hausdorff Distance (HD95), with average AMOS scores approaching full-label performance even in settings with only 33% of organs labeled. In comparison, naïve partial-label training with background-filling or per-organ losses delivered significant drops in anatomical fidelity and contour sharpness.

Figure 4: Qualitative comparison of segmentation results across methods; IPnP displays substantial improvements in organ coverage and boundaries relative to baselines in partial-label regimes.

Furthermore, IPnP maintains its performance edge in the clinical HnN dataset, especially on structures with scarce annotation (e.g., ocular structures), validating its capacity to exploit weak supervision and bridge annotation gaps.

Figure 5: Organ-specific performance metrics (DSC) in the HnN cohort; IPnP distinctly surpasses alternative approaches, particularly for organs with sparse annotations.

Implications and Future Directions

IPnP establishes an effective paradigm for medical segmentation under realistic, annotation-incomplete conditions by automating pseudo-label generation and refining supervision via a hybrid generalist–specialist scheme and voxel-level reliability analysis. This has practical implications for accelerating annotation-light deployment of segmentation algorithms in diverse clinical sites and organs, potentially enabling scalable population studies and streamlined clinical pipelines.

Theoretically, the approach demonstrates the potential of foundation models not merely as zero-shot predictors, but as synthesis partners in tailored, iterative workflows—suggesting more sophisticated, adaptive prompting strategies and interaction protocols may further reduce reliance on manual labels.

Potential extensions include expanding prompt modalities, integrating uncertainty-aware selection at deeper architectural levels, or dynamically weighting pseudo-supervision. Evaluation on further diverse pathologies, imaging modalities, and population cohorts will inform generalizability and limitations.

Conclusion

The IPnP framework embodies a robust, iterative solution to the partially labeled medical image segmentation problem by leveraging promptable foundation models for pseudo-label construction and rigorous supervision filtering. Empirical evidence demonstrates that IPnP recovers segmentation performance to near fully supervised levels, outperforming state-of-the-art partial-label segmentation baselines. The work demonstrates both strong practical impact for clinical AI adoption under annotation constraints and a clear methodological advance in generalist–specialist collaborative learning pipelines for medical imaging (2604.01038).

Markdown Report Issue