TY - GEN
T1 - Multi-layer pruning framework for compressing Single Shot MultiBox detector
AU - Singh, Pravendra
AU - Manikandan, R.
AU - Matiyali, Neeraj
AU - Namboodiri, Vinay P.
PY - 2019/3/7
Y1 - 2019/3/7
N2 - We propose a framework for compressing state-of-the-art Single Shot MultiBox Detector (SSD). The framework addresses compression in the following stages: Sparsity Induction, Filter Selection, and Filter Pruning. In the Sparsity Induction stage, the object detector model is sparsified via an improved global threshold. In Filter Selection & Pruning stage, we select and remove filters using sparsity statistics of filter weights in two consecutive convolutional layers. This results in the model with the size smaller than most existing compact architectures. We evaluate the performance of our framework with multiple datasets and compare over multiple methods. Experimental results show that our method achieves state-of-the-art compression of 6.7X and 4.9X on PASCAL VOC dataset on models SSD300 and SSD512 respectively. We further show that the method produces maximum compression of 26X with SSD512 on German Traffic Sign Detection Benchmark (GTSDB). Additionally, we also empirically show our method’s adaptability for classification based architecture VGG16 on datasets CIFAR and German Traffic Sign Recognition Benchmark (GTSRB) achieving a compression rate of 125X and 200X with the reduction in flops by 90.50% and 96.6% respectively with no loss of accuracy. In addition to this, our method does not require any special libraries or hardware support for the resulting compressed models.
AB - We propose a framework for compressing state-of-the-art Single Shot MultiBox Detector (SSD). The framework addresses compression in the following stages: Sparsity Induction, Filter Selection, and Filter Pruning. In the Sparsity Induction stage, the object detector model is sparsified via an improved global threshold. In Filter Selection & Pruning stage, we select and remove filters using sparsity statistics of filter weights in two consecutive convolutional layers. This results in the model with the size smaller than most existing compact architectures. We evaluate the performance of our framework with multiple datasets and compare over multiple methods. Experimental results show that our method achieves state-of-the-art compression of 6.7X and 4.9X on PASCAL VOC dataset on models SSD300 and SSD512 respectively. We further show that the method produces maximum compression of 26X with SSD512 on German Traffic Sign Detection Benchmark (GTSDB). Additionally, we also empirically show our method’s adaptability for classification based architecture VGG16 on datasets CIFAR and German Traffic Sign Recognition Benchmark (GTSRB) achieving a compression rate of 125X and 200X with the reduction in flops by 90.50% and 96.6% respectively with no loss of accuracy. In addition to this, our method does not require any special libraries or hardware support for the resulting compressed models.
UR - http://www.scopus.com/inward/record.url?scp=85063585263&partnerID=8YFLogxK
U2 - 10.1109/WACV.2019.00145
DO - 10.1109/WACV.2019.00145
M3 - Chapter in a published conference proceeding
AN - SCOPUS:85063585263
T3 - Proceedings - 2019 IEEE Winter Conference on Applications of Computer Vision, WACV 2019
SP - 1318
EP - 1327
BT - Proceedings - 2019 IEEE Winter Conference on Applications of Computer Vision, WACV 2019
PB - IEEE
T2 - 19th IEEE Winter Conference on Applications of Computer Vision, WACV 2019
Y2 - 7 January 2019 through 11 January 2019
ER -