- The paper introduces the IPnP framework, which employs a collaborative specialist and foundation model to recover full supervision from partially labeled data.
- It leverages iterative prompting and rigorous pseudo-label filtering, including voxel-level selection loss, to minimize noise and improve anatomical recovery.
- Empirical results on abdominal and head-and-neck datasets show that IPnP achieves near fully supervised performance even with only partial annotations.
Foundation Model-Guided Iterative Prompting and Pseudo-Labeling for Partially Labeled Medical Image Segmentation
Problem Statement and Motivation
Automated medical image segmentation has advanced using fully labeled datasets, yet real clinical workflows yield scans in which only a subset of structures are annotated due to annotation burden and variable clinical focus. This partial-label scenario restricts effective supervision, causing degraded model performance and incomplete anatomical learning. Existing remedies include ignoring unlabeled regions or generating pseudo-labels from predictions, but these are limited by the propagation of noise and the lack of reliable guidance for unlabeled organs.



Figure 1: Illustration of partial-label challenges and comparative segmentation outputs; (a) shows the full reference segmentation, (b) the partial supervision during training, (c) standard nnU-Net output, and (d) the proposed IPnP result.
Methodological Framework: IPnP
The Iterative Prompting and Pseudo-labeling (IPnP) framework addresses partial-label segmentation by synergetically leveraging a trainable specialist segmentation network and a frozen foundation model generalist with promptable zero-shot capability. The core process unfolds in four tightly-coupled stages: initial specialist training, specialist-guided prompt generation, foundation model pseudo-labeling with rigorous voxel-wise filtering, and iterative re-training with progressively refined supervision. This generalist–specialist interaction empowers the system to quasi-recover full supervision from partially labeled data, thus enhancing anatomical coverage and mitigating error propagation typical to conventional pseudo-labeling.
Figure 2: IPnP system overview: iterative interaction between the specialist and generalist, with pseudo-label generation, refinement, and re-incorporation into training.
Specialist–Generalist Collaboration and Prompt Design
Following initial training on available labels, the specialist produces coarse predictions. High-confidence organ regions—even for unlabeled classes—are localized through bounding box extraction using median-plane analysis. These 2D boxes from orthogonal views are provided as prompts to the foundation model, enabling spatially precise and context-sensitive pseudo-labeling for previously unlabeled organs.
Robust Pseudo-Label Refinement
A multi-constraint scheme ensures pseudo-label reliability:
Voxel-Level Selection Loss (VLS)
Training incorporates a VLS loss, selectively excluding unreliable voxels—based on consensus between prediction and pseudo-supervision—for unlabeled targets, while maintaining standard supervision for labeled ones. This minimizes the impact of noisy pseudo-labels during gradient updates, as shown to be critical by comparative ablation.
Empirical Evaluation
Simulated and real-world scenarios validate IPnP using the AMOS abdominal multi-organ dataset and a private head-and-neck cancer patient cohort with highly incomplete annotation. Reference comparisons encompass full supervision (upper bound), two partial supervision baselines, and the recent TransDoDNet.
Results consistently demonstrate IPnP outperforming all partial-label baselines on Dice Similarity Coefficient (DSC) and Hausdorff Distance (HD95), with average AMOS scores approaching full-label performance even in settings with only 33% of organs labeled. In comparison, naïve partial-label training with background-filling or per-organ losses delivered significant drops in anatomical fidelity and contour sharpness.
Figure 4: Qualitative comparison of segmentation results across methods; IPnP displays substantial improvements in organ coverage and boundaries relative to baselines in partial-label regimes.
Furthermore, IPnP maintains its performance edge in the clinical HnN dataset, especially on structures with scarce annotation (e.g., ocular structures), validating its capacity to exploit weak supervision and bridge annotation gaps.
Figure 5: Organ-specific performance metrics (DSC) in the HnN cohort; IPnP distinctly surpasses alternative approaches, particularly for organs with sparse annotations.
Implications and Future Directions
IPnP establishes an effective paradigm for medical segmentation under realistic, annotation-incomplete conditions by automating pseudo-label generation and refining supervision via a hybrid generalist–specialist scheme and voxel-level reliability analysis. This has practical implications for accelerating annotation-light deployment of segmentation algorithms in diverse clinical sites and organs, potentially enabling scalable population studies and streamlined clinical pipelines.
Theoretically, the approach demonstrates the potential of foundation models not merely as zero-shot predictors, but as synthesis partners in tailored, iterative workflows—suggesting more sophisticated, adaptive prompting strategies and interaction protocols may further reduce reliance on manual labels.
Potential extensions include expanding prompt modalities, integrating uncertainty-aware selection at deeper architectural levels, or dynamically weighting pseudo-supervision. Evaluation on further diverse pathologies, imaging modalities, and population cohorts will inform generalizability and limitations.
Conclusion
The IPnP framework embodies a robust, iterative solution to the partially labeled medical image segmentation problem by leveraging promptable foundation models for pseudo-label construction and rigorous supervision filtering. Empirical evidence demonstrates that IPnP recovers segmentation performance to near fully supervised levels, outperforming state-of-the-art partial-label segmentation baselines. The work demonstrates both strong practical impact for clinical AI adoption under annotation constraints and a clear methodological advance in generalist–specialist collaborative learning pipelines for medical imaging (2604.01038).