MbrlCatalogueTitleDetail

Do you wish to reserve the book?
Collaborative multi-knowledge distillation under the influence of softmax regression representation
Collaborative multi-knowledge distillation under the influence of softmax regression representation
Hey, we have placed the reservation for you!
Hey, we have placed the reservation for you!
By the way, why not check out events that you can attend while you pick your title.
You are currently in the queue to collect this book. You will be notified once it is your turn to collect the book.
Oops! Something went wrong.
Oops! Something went wrong.
Looks like we were not able to place the reservation. Kindly try again later.
Are you sure you want to remove the book from the shelf?
Collaborative multi-knowledge distillation under the influence of softmax regression representation
Oops! Something went wrong.
Oops! Something went wrong.
While trying to remove the title from your shelf something went wrong :( Kindly try again later!
Title added to your shelf!
Title added to your shelf!
View what I already have on My Shelf.
Oops! Something went wrong.
Oops! Something went wrong.
While trying to add the title to your shelf something went wrong :( Kindly try again later!
Do you wish to request the book?
Collaborative multi-knowledge distillation under the influence of softmax regression representation
Collaborative multi-knowledge distillation under the influence of softmax regression representation

Please be aware that the book you have requested cannot be checked out. If you would like to checkout this book, you can reserve another copy
How would you like to get it?
We have requested the book for you! Sorry the robot delivery is not available at the moment
We have requested the book for you!
We have requested the book for you!
Your request is successful and it will be processed during the Library working hours. Please check the status of your request in My Requests.
Oops! Something went wrong.
Oops! Something went wrong.
Looks like we were not able to place your request. Kindly try again later.
Collaborative multi-knowledge distillation under the influence of softmax regression representation
Collaborative multi-knowledge distillation under the influence of softmax regression representation
Journal Article

Collaborative multi-knowledge distillation under the influence of softmax regression representation

2024
Request Book From Autostore and Choose the Collection Method
Overview
Knowledge distillation can transfer knowledge from a powerful yet cumbersome teacher model to a less-parameterized student model, thus effectively achieving model compression. Various knowledge distillation methods have mainly focused on the task of knowledge transfer, and distillation location selection, which in turn increases the difficulty of model interpretation on the one hand, and on the other hand, there have been few works on the role of the teacher classifier in distillation. In this study, we propose a novel collaborative multi-knowledge distillation under the influence of softmax regression representation. Firstly, we propose a stage-wise logit knowledge distillation, where the teacher classifier is used as an auxiliary structure to align the features of the student and teacher models. By leveraging the teacher classifier, the student features are aligned with the teacher features in the logits space, eliminating the need for a complex feature projector that requires extensive computation to match the features between the teacher and student models. Secondly, considering the teacher classifier’s adaptability to classification features, we introduce a stage-wise feature knowledge distillation. This mechanism maps the features of the student model to a latent space with the same dimensions as the features of the teacher model, guiding the student’s features to align with the teacher’s final features using a Mean Square Error (MSE) loss. Finally, we propose a pseudo-teacher knowledge distillation loss to optimize the modeling of the deformation relationship between the student and teacher features. This loss provides additional gradient optimization information for the parameters of the feature projector. Extensive experiments on CIFAR-100 and ImageNet datasets demonstrate the superiority of the proposed model compared with the state-of-the-art methods. The code is available at https://github.com/chenKP/CMKD.git