- The paper proposes MMC loss that explicitly induces high sample density in feature space, significantly enhancing adversarial robustness.
- It leverages preset, optimally dispersed class centers to cluster features and improve intra-class compactness.
- Empirical evaluations on MNIST, CIFAR-10, and CIFAR-100 show that MMC yields higher robustness against strong adaptive attacks with minimal extra computation.
Deep neural networks trained with the standard Softmax Cross-Entropy (SCE) loss are known to be vulnerable to adversarial attacks. Existing work suggests that adversarially robust generalization requires significantly more data than standard training, implying that commonly used datasets might be insufficient for training robust models. Instead of collecting more data, this paper explores the strategy of better utilizing existing data by manipulating the local sample distribution in the feature space to induce regions of high sample density, which could provide sufficient local samples for robust learning.
The authors first analyze the SCE loss and its variants (termed generalized SCE or g-SCE) and show that these losses, due to the softmax function, provide supervisory signals that encourage learned features to spread sparsely in the feature space, particularly when the loss value is minimized towards zero. This sparsity leads to low sample density around feature points, which is hypothesized to be detrimental to robust learning. The analysis shows that the loss contours of g-SCE losses are generally hyperspheres or hyperplanes, and minimizing the loss tends to push features away from the center of these contours towards infinity, resulting in feature points being sparse.
To address this, the paper proposes the Max-Mahalanobis Center (MMC) loss, defined as LMMC(Z(x),y)=21∥z−μy∗∥22, where z=Z(x) is the feature of input x, y is the true label, and μy∗ are preset, untrainable class centers. These μy∗ centers are generated according to a criterion that maximizes the minimal angle between any two centers, providing optimal inter-class dispersion. The MMC loss is based on minimizing the squared distance between the feature vector and the corresponding class center, framing training as a regression problem towards these fixed centers.
The theoretical analysis of the MMC loss shows that it explicitly induces high-density regions in the feature space. The sample density nearby a feature point is proportional to the number of samples for that class (Nk) and inversely proportional to a power of the loss value (C), specifically ∝Nk⋅pk(C)/C(d−1)/2. This means that as the loss C is minimized towards zero, the sample density around the center μy∗ exponentially increases. This property ensures that feature points of the same class gather compactly around their respective centers, creating locally sufficient samples for robust classification.
The MMC loss offers several practical advantages:
- Induces High Sample Density: By encouraging features to cluster around preset centers, it creates high-density regions beneficial for robustness, unlike g-SCE losses which promote sparsity.
- Structured Representations: It leads to more structured and orderly feature distributions.
- Better Exploits Model Capacity: It allows the network to focus on minimizing intra-class compactness while inter-class dispersion is controlled by the preset centers, avoiding the need to balance these like the Center Loss.
- Faster Convergence: It generally converges faster than SCE and its variants.
- Little Extra Computation: Training with MMC adds minimal computational overhead compared to standard SCE.
- Compatibility: It can be combined with existing defenses like adversarial training to further improve robustness.
Empirical evaluations are conducted on MNIST, CIFAR-10, and CIFAR-100 datasets using various adaptive attacks, including white-box l∞ PGD, l2 C&W, black-box transfer-based MIM, and gradient-free SPSA attacks. Adaptive attacks are crucial for robust evaluation and are designed specifically against the MMC objective. The results demonstrate that models trained with MMC loss achieve significantly better robustness against these strong adaptive attacks compared to baselines like SCE, Center loss, MMLDA, and L-GM, often requiring much larger perturbations to fool the network. Importantly, MMC maintains clean accuracy comparable to SCE and is also shown to be more robust to general transformations like Gaussian noise and rotation compared to standard adversarial training methods. Ablation studies confirm that the use of optimally dispersed centers contributes to the improved robustness. The paper also shows that MMC can better leverage the capacity of deeper network architectures compared to SCE.
The paper includes technical details on generating the Max-Mahalanobis centers and discusses the choice of the squared-error form for the loss in the adversarial setting. It also proposes potential variants of the MMC loss, such as Elastic MMC (EMC) and Hierarchical MM centers, to enhance adaptability for more complex tasks or datasets.
In conclusion, the paper argues that SCE loss induces feature sparsity detrimental to robustness. It proposes the MMC loss as a regression-based alternative using preset, optimally dispersed centers to explicitly induce high-density feature regions and learn structured representations. Extensive experiments with adaptive attacks demonstrate that MMC significantly improves adversarial robustness with minimal computational overhead, while maintaining high clean accuracy, making it a practical and effective defense mechanism.