UniSD: Towards a Unified Self-Distillation Framework for Large Language Models

Published 7 May 2026 in cs.CL, cs.AI, and cs.LG | (2605.06597v1)

Abstract: Self-distillation (SD) offers a promising path for adapting LLMs without relying on stronger external teachers. However, SD in autoregressive LLMs remains challenging because self-generated trajectories are free-form, correctness is task-dependent, and plausible rationales can still provide unstable or unreliable supervision. Existing methods mainly examine isolated design choices, leaving their effectiveness, roles, and interactions unclear. In this paper, we propose UniSD, a unified framework to systematically study self-distillation. UniSD integrates complementary mechanisms that address supervision reliability, representation alignment, and training stability, including multi-teacher agreement, EMA teacher stabilization, token-level contrastive learning, feature matching, and divergence clipping. Across six benchmarks and six models from three model families, UniSD reveals when self-distillation improves over static imitation, which components drive the gains, and how these components interact across tasks. Guided by these insights, we construct UniSDfull, an integrated pipeline that combines complementary components and achieves the strongest overall performance, improving over the base model by +5.4 points and the strongest baseline by +2.8 points. Extensive evaluation highlights self-distillation as a practical and steerable approach for efficient LLM adaptation without stronger external teachers.

Abstract PDF Upgrade to Chat

Authors (10)

Summary

The paper presents UniSD, a reliability-aware self-distillation framework that leverages multi-teacher agreement, token-level contrast, and EMA stabilization to improve LLM performance by up to +5.4 points.
The paper shows that UniSD maintains generative distribution fidelity, reducing catastrophic forgetting with retention perplexity close to that of the base model and lower token-level divergence.
The paper demonstrates a scalable, cost-efficient approach that eliminates the need for external teachers while ensuring robustness across diverse LLM architectures and complex tasks.

UniSD: A Unified Self-Distillation Framework for LLMs

Introduction and Motivation

UniSD addresses the lack of systematic understanding and reliable methodology in self-distillation (SD) for autoregressive LLMs (2605.06597). Traditional LLM adaptation typically relies on stronger, external teacher models, which presents cost, privacy, and accessibility limitations. Instead, UniSD investigates whether LLMs can achieve reliable self-improvement by leveraging only self-derived supervision signals. Challenges in this paradigm include the open-endedness and variability of LLM generations, inherently unstable and noisy supervision, and the absence of best practices for robust SD in this setting.

UniSD Framework: Design and Mechanisms

UniSD formulates self-distillation as a reliability-aware self-correction framework that integrates complementary strategies along three axes: supervision reliability, representation alignment, and training stability.

Supervision Reliability is addressed through:
- Multi-Teacher Agreement: By producing multiple auxiliary teacher views via context variation or task-preserving perturbations, UniSD estimates reliability through cross-view agreement. Local (token-level) and global (sequence-level) disagreement are measured, allowing adaptive down-weighting of unreliable self-derived supervision.
- Token-Level Contrastive Learning: The model is encouraged to move closer to correct self-generated rationales and away from plausible but incorrect alternatives, constructed via candidate generation, corruption, or paraphrasing.
Representation Alignment encompasses:
- Feature Matching: The student is regularized towards teacher intermediate representations (e.g., hidden states), adding structural coherence beyond standard output-level KL or JSD constraints. This can be instantiated as matching final-layer hidden states or other internal signals.
Training Stability is improved by:
- EMA Teacher: An exponential moving average (EMA) of the student weights acts as a temporal ensemble-based teacher, smoothing supervision and reducing the deleterious propagation of transient model errors.
- Divergence Clipping: Outlier tokens with high divergence in local distributions are clipped to avoid destabilizing the update.

The full UniSD* pipeline jointly integrates these modules into a single, extensible framework. Each component serves a clear and isolated function with well-defined interactions, enabling controlled studies of individual effects and synergistic behaviors.

Empirical Evaluation

Comprehensive benchmarking is conducted across six datasets covering scientific QA, code generation, tool usage, and commonsense reasoning, alongside six LLMs from three model families (Qwen2.5, Llama-3.1, Gemma-3).

Core Results:

UniSD* achieves the highest overall accuracy, outperforming both static imitation and state-of-the-art self-distillation baselines. For Qwen2.5-7B, UniSD* improves by +5.4 points over the base model and +2.8 over the prior best baseline.
Gains are consistent across instruction-following, multi-turn tool use, and multimodal output settings, and generalize to OOD benchmarks.
Among components, EMA teacher and multi-teacher agreement provide the strongest individual gains. Token-level agreement typically yields higher peak accuracy, while sequence-level agreement improves baseline robustness and OOD generalization.
Agreement sensitivity analysis shows that additional auxiliary teacher contexts do not always improve performance. Diversity and construction of contexts (retrieved, random, induced) modulate the value of agreement supervision.
Joint feature/logit matching proves more beneficial than representation alignment in isolation.

Distribution Retention and Generalization

A central claim is that UniSD preserves the base model's generative distribution—measured through retention perplexity and KL divergence to base—much better than SFT or conventional fine-tuning, thereby mitigating catastrophic forgetting. This is critical for deployment in continual learning or real-world scenarios where model behaviors must remain consistent post-adaptation.

Retention Results:

For Qwen2.5-7B, EMA, agreement, and contrastive UniSD variants consistently yield retention perplexity near the raw model (e.g., 1.09–1.13 vs. 1.14 raw), while SFT substantially worsens it (1.68).
Unlike SFT, UniSD* also achieves lower token-level JSD between adapted and base model distributions.

Cross-architecture experiments demonstrate that UniSD* transfers gains to Llama-3.1 and Gemma-3, with strong improvements in 15 out of 18 model-dataset settings, evidencing the generality of reliability-weighted SD beyond a particular LLM backbone.

Resource Efficiency and Practical Implications

UniSD variants are analyzed for energy, memory, and throughput requirements. Agreement-based reliability mechanisms impose the highest overhead due to repeated scoring, while single-teacher stabilizers (EMA, clipping, contrast, feature matching) are more efficient. This informs an adaptive reliability-cost trade-off, where compute-intensive agreement can be applied selectively to uncertain examples, and lightweight stabilizers can be applied by default. UniSD thereby provides a foundation for cost-aware, scalable LLM adaptation.

From a practical perspective, UniSD reduces or eliminates the dependence on external teachers, lowering both economic and carbon costs of post-training, and making adaptation workflows more accessible for constrained research settings.

Theoretical and Methodological Implications

UniSD demonstrates that self-distillation for autoregressive LLMs is most effective when supervision signals are reliability-weighted, auxiliary views are leveraged for stability, and both representation and output alignment are incorporated. The unified modular framework allows systematic isolation and analysis of these effects, clarifying previously confounded interactions among SD strategies.

The results support the theoretical premise that suitably calibrated self-derived signals, filtered for reliability, are sufficient for substantial model improvement in the absence of stronger external teachers—even for complex, open-ended generative tasks. Furthermore, the findings reinforce the known limitations of static imitation (e.g., SFT) and highlight the superiority of on-policy, reliability-aware adaptation.

Limitations and Future Directions

UniSD primarily targets single-turn tasks. Extension to long-horizon/agentic settings—where feedback is even more sparse and temporally delayed—will require new reliability metrics and possibly novel forms of temporal stability control. Additionally, finer-grained evaluation protocols and richer contrastive objectives can further characterize and enhance the efficacy of self-derived supervision. Prompt optimization, richer context induction, and selective reliability estimation are promising future directions.

Ethical considerations remain: UniSD cannot overcome the limitations inherited from the base model, such as bias, hallucination, or unsafe behaviors. Reliability-weighting and divergence clipping mitigate, but do not eliminate, such risks.

Conclusion

UniSD delivers the first unified and extensible framework for self-distillation in LLMs, systematically integrating advances in reliability-aware supervision, feature-level alignment, and stabilization of the adaptation process. Empirical evidence demonstrates strong and broad improvements over baselines, effective distribution retention, practical resource usage, and architectural generality. UniSD thus establishes a principled and practical basis for teacher-free, efficiency-driven LLM adaptation, with significant implications for the future of efficient, privacy-preserving, and accessible post-training workflows in large-scale NLP.

Markdown Report Issue