Abstract
Loss functions with a large number of saddle points are one of the major obstacles for training modern machine learning (ML) models efficiently. First-order methods such as gradient descent (GD) are usually the methods of choice for training ML models. However, these methods converge to saddle points for certain choices of initial guesses. In this paper, we propose a modification of the recently proposed Laplacian smoothing gradient descent (LSGD) [Osher et al., arXiv:1806.06317], called modified LSGD (mLSGD), and demonstrate its potential to avoid saddle points without sacrificing the convergence rate. Our analysis is based on the attraction region, formed by all starting points for which the considered numerical scheme converges to a saddle point. We investigate the attraction region’s dimension both analytically and numerically. For a canonical class of quadratic functions, we show that the dimension of the attraction region for mLSGD is ⌊(n − 1)/2⌋, and hence it is significantly smaller than that of GD whose dimension is n − 1.
Original language | English |
---|---|
Pages (from-to) | 738-757 |
Number of pages | 20 |
Journal | European Journal of Applied Mathematics |
Volume | 34 |
Issue number | 4 |
Early online date | 9 Nov 2022 |
DOIs | |
Publication status | Published - 31 Aug 2023 |
Bibliographical note
This material is based on research sponsored by the National Science Foundation under grant numbers DMS-1924935, DMS-1952339, DMS-1554564 (STROBE), DMS-2152762 and DMS-2208361, the Air Force Research Laboratory under grant numbers FA9550-18-0167 and MURI FA9550-18-1-0502, the Office of Naval Research under the grant number N00014-18-1-2527 and the Department of Energy under the grant number DE-SC0021142 and DE-SC0002722. LMK acknowledges support from the German National Academic Foundation (Studienstiftung des Deutschen Volkes), the European Union Horizon 2020 research and innovation programmes under the Marie Skłodowska-Curie grant agreement No. 777826 (NoMADS) and No. 691070 (CHiPS), the Cantab Capital Institute for the Mathematics of Information and Magdalene College, Cambridge (Nevile Research Fellowship). SJO was partially funded by the Office of Naval Research under the grant numbers N00014-18-20-1-2093 and N00014-20-1-2787.Funding
Acknowledgements. This material is based on research sponsored by the National Science Foundation under grant numbers DMS-1924935, DMS-1952339, DMS-1554564 (STROBE), DMS-2152762 and DMS-2208361, the Air Force Research Laboratory under grant numbers FA9550-18-0167 and MURI FA9550-18-1-0502, the Office of Naval Research under the grant number N00014-18-1-2527 and the Department of Energy under the grant number DE-SC0021142 and DE-SC0002722. LMK acknowledges support from the German National Academic Foundation (Studienstiftung des Deutschen Volkes), the European Union Horizon 2020 research and innovation programmes under the Marie Skłodowska-Curie grant agreement No. 777826 (NoMADS) and No. 691070 (CHiPS), the Cantab Capital Institute for the Mathematics of Information and Magdalene College, Cambridge (Nevile Research Fellowship). SJO was partially funded by the Office of Naval Research under the grant numbers N00014-18-20-1-2093 and N00014-20-1-2787.
Funders | Funder number |
---|---|
CHiPS | |
Cantab Capital Institute for the Mathematics of Information and Magdalene College, Cambridge | N00014-18-20-1-2093, N00014-20-1-2787 |
National Science Foundation | DMS-2208361, DMS-1952339, DMS-1924935, DMS-2152762, DMS-1554564 |
Office of Naval Research | N00014-18-1-2527 |
U.S. Department of Energy | DE-SC0002722, DE-SC0021142 |
Air Force Research Laboratory | FA9550-18-0167 |
H2020 Marie Skłodowska-Curie Actions | 691070, 777826 |
Multidisciplinary University Research Initiative | FA9550-18-1-0502 |
Studienstiftung des Deutschen Volkes | |
Horizon 2020 |
Keywords
- cs.LG
- cs.NA
- math.DS
- math.NA
- stat.ML
- attraction region
- saddle points
- gradient-based methods
- Laplacian smoothing
- Deterministic algorithm
ASJC Scopus subject areas
- Applied Mathematics