Deep Knowledge Distillation using Trainable Dense Attention

Bharat Bhusan Sau, Soumya Roy, Vinay P. Namboodiri, Raghu Sesha Iyengar

Research output: Contribution to conferencePaperpeer-review

Abstract

Knowledge distillation based deep model compression has been actively pursued in order to obtain improved performance on specified student architectures by distilling knowledge from deeper networks. Among various methods, attention based knowledge distillation has shown great promise on large datasets. However, this approach is limited by hand-designed attention functions such as absolute sum. We address this shortcoming by proposing trainable attention methods that can be used to obtain improved performance while distilling knowledge from teacher to student. We also show that, using dense connections efficiently between attention modules, we can further improve the student's performance. Our approach, when applied to ResNet50(teacher)MobileNetv1(student) pair on ImageNet dataset, has a reduction of 9.6% in Top-1 error rate over the previous state-of-the-art method.

Original languageEnglish
Publication statusPublished - 25 Nov 2021
Event32nd British Machine Vision Conference, BMVC 2021 - Virtual, Online
Duration: 22 Nov 202125 Nov 2021

Conference

Conference32nd British Machine Vision Conference, BMVC 2021
CityVirtual, Online
Period22/11/2125/11/21

Bibliographical note

Publisher Copyright:
© 2021. The copyright of this document resides with its authors.

ASJC Scopus subject areas

  • Artificial Intelligence
  • Computer Vision and Pattern Recognition

Fingerprint

Dive into the research topics of 'Deep Knowledge Distillation using Trainable Dense Attention'. Together they form a unique fingerprint.

Cite this