A journal of IEEE and CAA , publishes high-quality papers in English on original theoretical/experimental research and development in all areas of automation
Volume 11 Issue 11
Nov.  2024

IEEE/CAA Journal of Automatica Sinica

  • JCR Impact Factor: 15.3, Top 1 (SCI Q1)
    CiteScore: 23.5, Top 2% (Q1)
    Google Scholar h5-index: 77, TOP 5
Z. Yin, J. Pu, Y. Zhou, and  X. Xue,  “Two-stage approach for targeted knowledge transfer in self-knowledge distillation,” IEEE/CAA J. Autom. Sinica, vol. 11, no. 11, pp. 2270–2283, Nov. 2024. doi: 10.1109/JAS.2024.124629
Two-Stage Approach for Targeted Knowledge Transfer in Self-Knowledge Distillation

doi: 10.1109/JAS.2024.124629
Funds:  This work was supported by the National Natural Science Foundation of China (62176061)
  • Knowledge distillation (KD) enhances student network generalization by transferring dark knowledge from a complex teacher network. To optimize computational expenditure and memory utilization, self-knowledge distillation (SKD) extracts dark knowledge from the model itself rather than an external teacher network. However, previous SKD methods performed distillation indiscriminately on full datasets, overlooking the analysis of representative samples. In this work, we present a novel two-stage approach to providing targeted knowledge on specific samples, named two-stage approach self-knowledge distillation (TOAST). We first soften the hard targets using class medoids generated based on logit vectors per class. Then, we iteratively distill the under-trained data with past predictions of half the batch size. The two-stage knowledge is linearly combined, efficiently enhancing model performance. Extensive experiments conducted on five backbone architectures show our method is model-agnostic and achieves the best generalization performance. Besides, TOAST is strongly compatible with existing augmentation-based regularization methods. Our method also obtains a speedup of up to 2.95x compared with a recent state-of-the-art method.


    • Propose a novel two-stage self-knowledge distillation approach for selective dark knowledge transfer
    • Generate class medoids from logit vectors to represent typical samples per class
    • Distill under-trained data using past predictions on half batch size
    • Experiments show wide compatibility and state-of-the-art generalization


