Knowledge distillation based on projector integration and classifier sharing

Guanpeng Zuo, Chenlu Zhang, Zhe Zheng, Wu Zhang, Ruiqing Wang, Jingqi Lu, Xiu Jin, Zhaohui Jiang, Yuan Rao

Research output: Contribution to journalArticlepeer-review

Abstract

Knowledge distillation can transfer the knowledge from the pre-trained teacher model to the student model, thus effectively accomplishing model compression. Previous studies have carefully crafted knowledge representation, targeting loss function design, and distillation location selection, but there have been few studies on the role of classifiers in distillation. Previous experiences have shown that the final classifier of the model has an essential role in making inferences, so this paper attempts to narrow the gap in performance between models by having the student model directly use the classifier of the teacher model for the final inference, which requires an additional projector to help match features of the student encoder with the teacher's classifier. However, a single projector cannot fully align the features, and integrating multiple projectors may result in better performance. Considering the balance between projector size and performance, through experiments, we obtain the size of projectors for different network combinations and propose a simple method for projector integration. In this way, the student model undergoes feature projection and then uses the classifiers of the teacher model for inference, obtaining a similar performance to the teacher model. Through extensive experiments on the CIFAR-100 and Tiny-ImageNet datasets, we show that our approach applies to various teacher–student frameworks simply and effectively.

Original languageEnglish
Pages (from-to)4521-4533
Number of pages13
JournalComplex and Intelligent Systems
Volume10
Issue number3
DOIs
StatePublished - 1 Jun 2024
Externally publishedYes

Keywords

  • Deep neural network
  • Features projection
  • Knowledge distillation
  • Model compression

ASJC Scopus subject areas

  • Information Systems
  • Engineering (miscellaneous)
  • Computational Mathematics
  • Artificial Intelligence

Fingerprint

Dive into the research topics of 'Knowledge distillation based on projector integration and classifier sharing'. Together they form a unique fingerprint.

Cite this