TQCompressor: Improving Tensor Decomposition Methods in Neural Networks Via Permutations

Vadim Abronin, Aleksei Naumov, Denis Mazur, Dmitriy Bystrov, Katerina Tsarova, Artem Melnikov, Sergey Dolgov, Reuben Brasher, Michael Perelshein

Research output: Chapter or section in a book/report/conference proceedingChapter in a published conference proceeding

Abstract

We introduce TQCompressor a neural network model compression method using enhanced tensor decompositions. We propose a permutation-based improvement to Kronecker decomposition, reducing the loss in model expressivity typically associated with compression. Applied to GPT-2small, this results in the TQCompressedGPT-2 model with 81 million parameters, down from 124 million. Enhanced through multi-step knowledge distillation on 3.1% of OpenWebText, TQCompressedGPT-2 outperforms DistilGPT-2 and KnGPT-2. We made TQCompressedGPT-2 publicly available.

Original languageEnglish
Title of host publicationProceedings - 2024 IEEE 7th International Conference on Multimedia Information Processing and Retrieval, MIPR 2024
Place of PublicationU. S. A.
PublisherIEEE
Pages503-506
Number of pages4
ISBN (Electronic)9798350351422
ISBN (Print)9798350351439
DOIs
Publication statusPublished - 15 Oct 2024
Event7th IEEE International Conference on Multimedia Information Processing and Retrieval, MIPR 2024 - San Jose, USA United States
Duration: 7 Aug 20249 Aug 2024

Conference

Conference7th IEEE International Conference on Multimedia Information Processing and Retrieval, MIPR 2024
Country/TerritoryUSA United States
CitySan Jose
Period7/08/249/08/24

Keywords

  • GPT-2
  • Knowledge distillation
  • Kronecker decomposition
  • Neural network compression
  • Tensor decomposition

ASJC Scopus subject areas

  • Artificial Intelligence
  • Computer Science Applications
  • Computer Vision and Pattern Recognition
  • Information Systems
  • Media Technology

Fingerprint

Dive into the research topics of 'TQCompressor: Improving Tensor Decomposition Methods in Neural Networks Via Permutations'. Together they form a unique fingerprint.

Cite this