OVER-PARAMETERIZED MODEL OPTIMIZATION WITH POLYAK-ŁOJASIEWICZ CONDITION

Y. Chen, Y. Shi, M. Dong, X. Yang, D. Li, Y. Wang, R.P. Dick, Qin Lv, Y. Zhao, F. Yang, N. Gu, L. Shang

Research output: Chapter or section in a book/report/conference proceedingChapter in a published conference proceeding

2 Citations (SciVal)

Abstract

This work pursues the optimization of over-parameterized deep models for superior training efficiency and test performance. We first theoretically emphasize the importance of two properties of over-parameterized models, i.e., the convergence gap and the generalization gap. Subsequent analyses unveil that these two gaps can be upper-bounded by the ratio of the Lipschitz constant and the PolyakŁojasiewicz (PL) constant, a crucial term abbreviated as the condition number. Such discoveries have led to a structured pruning method with a novel pruning criterion. That is, we devise a gating network that dynamically detects and masks out those poorly-behaved nodes of a deep model during the training session. To this end, this gating network is learned via minimizing the condition number of the target model, and this process can be implemented as an extra regularization loss term. Experimental studies demonstrate that the proposed method outperforms the baselines in terms of both training efficiency and test performance, exhibiting the potential of generalizing to a variety of deep network architectures and tasks.
Original languageEnglish
Title of host publication11th International Conference on Learning Representations, ICLR 2023
Publication statusPublished - 1 Feb 2023

Fingerprint

Dive into the research topics of 'OVER-PARAMETERIZED MODEL OPTIMIZATION WITH POLYAK-ŁOJASIEWICZ CONDITION'. Together they form a unique fingerprint.

Cite this