- The paper introduces a novel pixel-level, threshold-free framework driven by a category-agnostic change head (CACH) for detecting semantic changes in remote sensing images.
- It fuses bi-temporal feature extraction with attention-based difference calibration to outperform state-of-the-art methods on both binary and multi-class change benchmarks.
- The development of the CA-CDD dataset and improvements in GPU memory and runtime highlight its practical benefits for open-vocabulary, domain-adaptive monitoring.
Seg2Change: Adapting Open-Vocabulary Semantic Segmentation Models for Remote Sensing Change Detection
Introduction and Motivation
Change detection in remote sensing is essential for quantifying anthropogenic and ecological dynamics in land-cover. Conventional methods are fundamentally constrained by reliance on closed-set categories available in labeled datasets, hindering adaptation to open-world scenarios where categories of interest are not strictly predefined. Despite the advances in visionโLLMs (VLMs) and open-vocabulary semantic segmentation (OVSS) frameworks, effective adaptation of these models to open-vocabulary change detection (OVCD)โthat is, detecting arbitrary semantic changes between temporally disparate remote sensing imagesโremains unsolved. Prevailing OVCD paradigms, such as MโCโI and IโMโC, inherit inefficiencies and error cascades from segmentation mask proposal modules and require hand-engineered thresholds for change determination. This paper proposes the Seg2Change framework, which departs from instance-oriented segmentation cascades and instead introduces a category-agnostic, pixel-level change reasoning paradigm. A principal component is the category-agnostic change head (CACH), supported by a new dataset (CA-CDD) specifically annotated for agnostic category change detection.
Figure 1: Visual comparison between predefined change category datasets (a) and the proposed category-agnostic change detection dataset (b); the latter supports unconstrained semantic diversity.
Methodology
Paradigm Shift: From Instance-Based to Category-Agnostic Change Reasoning
Traditional OVCD pipelines, which rely on segmentation proposals as units of comparison, are fundamentally limited by mask fragmentation and error propagation. Change determination via fixed or globally engineered thresholds further limits the flexibility, generalizability, and accuracy of these systems, particularly under domain shift or novel-category settings. Seg2Change circumvents these limitations by pioneering a pixel-level, threshold-free approach. The framework decouples change map prediction from instance segmentation, leveraging the high spatial and semantic generalization capacity of OVSS models.
Figure 2: Annotation process in CA-CDD: annotators delineate change only where semantic categories differ between bi-temporal images, without restricting to a closed set.
Category-Agnostic Change Head (CACH)
CACH transforms open-vocabulary semantic segmentation outputs into robust change maps agnostic to specific class semantics. The architecture is staged as follows:
- Feature extraction using DINOv2 on input image pairs;
- Multiscale feature modulation to align intermediate representations;
- Bi-temporal Difference Fusion Module (BDFM) to enhance discrepancy regions with convolutional feature activations and attention modulation over absolute bi-temporal differences;
- Effective Difference Query Attention (EDQA) that calibrates difference features using guidance aggregated from cross-temporal relationships, operationalized with a sliding-window attention mechanism and Mixture-of-Experts (MoE) for adaptive, nonstationary domain mixing.
CACH aggregates these modules via a progressive residual upsampling path (ResUp), culminating in dense, threshold-free agnostic change maps.
Figure 3: The CACH pipeline fuses bi-temporal features, calibrates difference signals, and upscales to a dense change map.
Figure 4: The BDFM and EDQA modulesโcore to CACHโhighlight guided difference extraction and robust calibration under domain shift.
Integrating OVSS for Semantic Change Indexing
In parallel, the same bi-temporal images and user-specified textual prompts are input into an OVSS model (e.g., SegEarth-OV3), producing time-resolved semantic segmentation maps. Final semantic change maps are yielded by gating (elementwise multiplication) the semantic predictions with the agnostic change map, thus ensuring semantic pixel assignments are propagated only in regions of confidently detected change. This architecture supports both binary and multi-class OVCD scenarios.
Figure 5: Overview of the Seg2Change pipeline: bi-temporal images are processed via CACH and OVSS; outputs are fused to yield open-vocabulary change maps.
Dataset Construction: CA-CDD
A substantial obstacle for category-agnostic change detection is the lack of suitable benchmarks. The authors annotate and refine the CA-CDD, which extends prior datasets (e.g., SECOND, JL1-CD, CNAM-CD) by relabeling change regions without imposing fixed semantic constraints. The dataset supports both densely annotated multi-class change regions and broad, unconstrained detection scenarios. This enables concurrent supervised training and fair evaluation of models under open-set shifts.
Figure 6: Label refinement in CA-CDD enables fine-grained, open-category change identification compared to original coarse, category-limited labels.
Experimental Results
Seg2Change was evaluated across six major remote sensing benchmarks, including WHU-CD, LEVIR-CD, DSIFN, CLCD, SC-SCD, and SECOND, covering binary building/land-cover change detection and semantic change detection. Results demonstrate:
- On binary change benchmarks (e.g., WHU-CD), Seg2Change achieves IoUc=75.72%, outperforming the previous best method by 9.52 points.
- On semantic change benchmarks (e.g., SECOND), Seg2Change delivers an absolute improvement in mIoUc of 5.50 points over prior art.
- The approach simultaneously reduces GPU memory requirement and runtime by 36% and 53%, respectively, due to parallelizable, proposal-free inference.
- Ablation studies confirm the additive value of each module in CACH, with the sliding window attention and MoE mechanisms essential for adapting to domain shifts in bi-temporal input.
Figure 7: Seg2Change enables high-fidelity open-vocabulary building and land-cover change localization, even in previously unannotated categories.
Figure 8: Seg2Change produces highly discriminative semantic change maps across multiple classes with reliable boundary localization.
Figure 9: Qualitative comparison on WHU-CD, LEVIR-CD, and DSIFN; Seg2Change reduces both false positives and false negatives compared to proposal-based SOTA.
Figure 10: Direct semantic change detection exemplified on SC-SCD, visualizing class-specific transitions within agnostic change regions.
Implications and Future Outlook
Seg2Change directly addresses primary limitations in current OVCD systems: (1) error accumulation from proposal-based instance segmentation, (2) inflexible, non-generalizable threshold tuning, (3) lack of annotated data for open-category evaluation, and (4) inefficiency in pipeline integration and inference. By proposing a category-agnostic strategy that is compatible with advances in OVSS and VLMs, CACH establishes a bridge for scaling open-vocabulary change detection toward larger, more diverse, and real-world datasets. Key implications are:
- Practical: Efficient, threshold-free adaptation of foundation models for monitoring dynamic environments, urban expansion, environmental disasters, etc., in open-set conditions.
- Theoretical: The paradigm supports semantic compositionality and domain adaptation, mitigating annotation bottlenecks and enabling robust foundation model fine-tuning with weak supervision.
- Future Directions: Extending this framework to multi-temporal (sequence) change detection, continual learning for persistent environments, joint geolocalized prompt conditioning, and plug-and-play integration with emerging VLM architectures (e.g., multi-modal LLMs with temporal modeling) are logical progression routes.
Conclusion
Seg2Change defines a new paradigm for open-vocabulary change detection in remote sensing by decoupling category-agnostic change localization from semantic segmentation and fusing the strengths of robust feature encoders and large-scale visionโLLMs. Supported by a newly constructed category-agnostic dataset and rigorous evaluation, Seg2Change improves both accuracy and resource efficiency, demonstrating the viability and necessity of threshold-free, compositional approaches for open-world geospatial monitoring. The public release of code and datasets underpins future research in scalable and generalizable remote sensing change detection.
(2604.11231)