Graph explorer

Residual Knowledge Distillation

Knowledge distillation (KD) is one of the most potent ways for model compression. The key idea is to transfer the knowledge from a deep teacher model (T) to a shallower student (S). However, existing methods suffer from performance degradation due to the substantial gap between the learning capacities of S and T. To remedy this problem, this work proposes Residual Knowledge Distillation (RKD), which further distills the knowledge by introducing an assistant (A). Specifically, S is trained to mimic the feature maps of T, and A aids this process by learning the residual error between them. In this way, S and A complement with each other to get better knowledge from T. Furthermore, we devise an effective method to derive S and A from a given model without increasing the total computational cost. Extensive experiments show that our approach achieves appealing results on popular classification datasets, CIFAR-100 and ImageNet, surpassing state-of-the-art methods.

7 nodes9 linksoverview previewResidual Knowledge Distillation
7 nodes9 links
Residual Knowledge Distillation7 visible / 7 total nodes / 15 links
Related contextCo-authorshipCo-authorshipCo-authorshipCo-authorshipCo-authorshipCo-authorshipAuthorshipWorks onWorks onAuthorshipAuthorshipAuthorshipTopic signalTopic signalWResidual Knowledge Distillationpreprint / 2020AMengya GaoResearcherAYujun ShenResearcherAQuanquan LiResearcherAChen Change LoyResearcherTMachine Learning49008 worksTComputer Vision30606 works
PaperSignal 106 links

Residual Knowledge Distillation

preprint / 2020

Open