ResGuard: Enhancing Robustness Against Known Original Attacks in Deep Watermarking

Published 4 Apr 2026 in cs.CV | (2604.03693v1)

Abstract: Deep learning-based image watermarking commonly adopts an "Encoder-Noise Layer-Decoder" (END) architecture to improve robustness against random channel distortions, yet it often overlooks intentional manipulations introduced by adversaries with additional knowledge. In this paper, we revisit this paradigm and expose a critical yet underexplored vulnerability: the Known Original Attack (KOA), where an adversary has access to multiple original-watermarked image pairs, enabling various targeted suppression strategies. We show that even a simple residual-based removal approach, namely estimating an embedding residual from known pairs and subtracting it from unseen watermarked images, can almost completely remove the watermark while preserving visual quality. This vulnerability stems from the insufficient image dependency of residuals produced by END frameworks, which makes them transferable across images. To address this, we propose ResGuard, a plug-and-play module that enhances KOA robustness by enforcing image-dependent embedding. Its core lies in a residual specificity enhancement loss, which encourages residuals to be tightly coupled with their host images and thus improves image dependency. Furthermore, an auxiliary KOA noise layer injects residual-style perturbations during training, allowing the decoder to remain reliable under stronger embedding inconsistencies. Integrated into existing frameworks, ResGuard boosts KOA robustness, improving average watermark extraction accuracy from 59.87% to 99.81%.

Abstract PDF Upgrade to Chat

Authors (5)

Summary

The paper introduces ResGuard to counter adversarial KOA attacks by enhancing image-specific watermark embedding through a novel residual specificity loss and KOA noise layer.
Experimental results show that ResGuard raises watermark extraction accuracy from approximately 59.87% to nearly 99.81% across five baseline models under KOA conditions.
The method maintains watermark imperceptibility and robustness to common channel distortions, ensuring minimal visual quality degradation.

ResGuard: Enhancing Robustness Against Known Original Attacks in Deep Watermarking

Introduction and Motivation

Deep learning-based image watermarking, commonly leveraging the Encoder–Noise Layer–Decoder (END) architecture, has achieved significant robustness against random channel distortions such as JPEG compression, Gaussian noise, and other forms of signal degradations. However, these architectures remain insufficiently protected against adversarial attacks with access to additional knowledge, particularly the Known Original Attack (KOA). KOA occurs when an adversary has access to multiple host–watermarked image pairs, enabling estimation and removal of embedding residuals that are not sufficiently image-dependent. This transferability of residuals allows the removal of watermarks from unseen images with minimal impact on visual quality, exposing a critical security vulnerability.

Figure 1: KOA illustration—residual extracted from a single host–watermarked pair suppresses watermark decoding across other images.

This paper identifies the core issue underlying the vulnerability: residuals generated by END-based watermarking frameworks lack strong image-dependency and are highly consistent across different host images. Addressing this, the authors propose ResGuard, a plug-and-play module focused on enforcing highly image-dependent embedding through a novel residual specificity enhancement loss and an auxiliary KOA noise layer.

Methodology: ResGuard Framework and Loss Functions

ResGuard’s approach is anchored in enhancing image-specificity of embedding residuals and suppressing cross-image transferability. The framework is structured as follows:

Residual Specificity Enhancement (RSE) Loss: Encourages residuals from different host images with the same message to diverge, while pulling together residuals from the same host image with varying messages. This contrastive loss formulation stabilizes optimization and emphasizes the dominance of host image content in shaping the residual.
KOA Noise Layer: During training, simulates residual-based perturbations by applying residuals from a known host–watermarked pair to another image, forcing the decoder to maintain accurate watermark extraction even under such manipulations.
Standard Loss Components: Image loss ( $\mathcal{L}_{img}$ ) preserves imperceptibility; message loss ( $\mathcal{L}_{mes}$ ) ensures robust extraction under both channel distortions and KOA.
Figure 2: ResGuard architecture: encoder, noise layers (combined and KOA), residual specificity module, decoder, and loss functions.

Experimental Results: Robustness to KOA and Channel Distortions

Evaluation across five deep learning-based watermarking baselines—HiDDeN, MBRS, CIN, RoSteALS, and InvisMark—shows that traditional END-based watermarking models experience sharp declines in bitwise extraction accuracy under KOA as the number of available host–watermarked pairs increases. Even $N=1$ (minimal attacker knowledge) results in significant suppression of watermark extraction. In all cases, models enhanced with ResGuard maintain extraction accuracy near 100%, showing strong robustness irrespective of $N$ .

Figure 3: Bitwise extraction accuracy under KOA, with ResGuard-enhanced models sustaining high robustness as $N$ increases.

Robustness persists even under diverse real-world channel distortions (JPEG, Gaussian noise, salt-and-pepper, Gaussian blur, median filtering), with negligible impact on watermark extraction accuracy for ResGuard-enabled models.

Figure 4: Representative distortions used in evaluation—JPEG, Gaussian noise, salt-and-pepper, Gaussian blur, median filter.

Residual Image-Specificity and Imperceptibility

Quantitative assessment of residual similarity using pairwise cosine metrics demonstrates that ResGuard significantly reduces inter-image residual similarity (~45.22% reduction). Qualitative visualization further reveals that ResGuard produces content-adaptive, diverse, and highly image-specific residual patterns, effectively breaking cross-image generalization.

Figure 5: Inter-image residual similarity drops post-ResGuard across all baselines, indicating enhanced image-specificity.

Figure 6: Visual comparison of baseline vs. ResGuard-enhanced residuals; content-driven diversity is apparent.

Despite enhanced robustness, imperceptibility of watermarked images remains uncompromised. PSNR, SSIM, and LPIPS metrics consistently show negligible deviations between original and ResGuard-integrated models. KOA does not introduce visible artifacts; attacked images remain indistinguishable from watermarked images.

Figure 7: Host, watermarked, KOA-attacked images, and respective residuals—ResGuard enhances robustness while maintaining visual quality.

Ablation Studies and Analysis of Message Variability

Isolated application of RSE and KOA noise layer individually increases KOA robustness, but their combination delivers near-perfect defense across all baselines. Ablation confirms their complementary effects: RSE regularizes embedding residuals for image-dependency, KOA noise layer simulates and trains for adversarial perturbations.

Message variability analysis reveals KOA remains effective even when watermark messages change across images, with accuracy drops from ~100% to ~70% for baselines, while ResGuard persistently preserves performance. Increasing $N$ in the different-message scenario lessens attack potency as message-dependent residual components cancel during averaging, highlighting the centrality of cross-image residual similarity in KOA vulnerability.

Figure 8: Bitwise extraction accuracy under KOA across baselines with message variability; ResGuard maintains high defense.

Implications and Future Directions

Theoretical implications are clear: robustness to random channel distortions does not guarantee robustness to informed adversarial attacks. The paper establishes the necessity of enforcing image-dependent embedding in watermarking to prevent residual transferability. Practically, ResGuard’s plug-and-play design offers immediate utility for upgrading security in deployed watermarking systems across content provenance, copyright, and AI-generated image authentication.

Future work may include extending image-dependency enforcement mechanisms to video watermarking, exploring adversarial attack models beyond KOA, and developing architectures capable of simultaneously optimizing for imperceptibility, robustness, and security. Additionally, incorporating generative or inversion-based perturbation layers could further improve defense against more complex attacks.

Conclusion

ResGuard addresses a fundamental vulnerability in deep learning-based image watermarking by enforcing content-adaptive, non-transferable embedding residuals through a residual specificity enhancement loss and KOA noise layer. This approach delivers substantial improvements—average watermark extraction accuracy under KOA rises from 59.87% to 99.81% across diverse baselines, without compromising imperceptibility or robustness to channel distortions. The findings presented underscore the importance of image-specific embedding for secure watermarking and pave the way for more informed defenses against adversarial manipulation in future AI-driven visual authentication systems.

(2604.03693)

Markdown Report Issue