A Deterministic Gradient-Based Approach to Avoid Saddle Points

Lisa Maria Kreusser, Stanley J. Osher, Bao Wang

Research output: Contribution to journalArticlepeer-review

63 Downloads (Pure)

Abstract

Loss functions with a large number of saddle points are one of the major obstacles for training modern machine learning (ML) models efficiently. First-order methods such as gradient descent (GD) are usually the methods of choice for training ML models. However, these methods converge to saddle points for certain choices of initial guesses. In this paper, we propose a modification of the recently proposed Laplacian smoothing gradient descent (LSGD) [Osher et al., arXiv:1806.06317], called modified LSGD (mLSGD), and demonstrate its potential to avoid saddle points without sacrificing the convergence rate. Our analysis is based on the attraction region, formed by all starting points for which the considered numerical scheme converges to a saddle point. We investigate the attraction region’s dimension both analytically and numerically. For a canonical class of quadratic functions, we show that the dimension of the attraction region for mLSGD is ⌊(n − 1)/2⌋, and hence it is significantly smaller than that of GD whose dimension is n − 1.

Original languageEnglish
Pages (from-to)738-757
Number of pages20
JournalEuropean Journal of Applied Mathematics
Volume34
Issue number4
Early online date9 Nov 2022
DOIs
Publication statusPublished - 31 Aug 2023

Bibliographical note

This material is based on research sponsored by the National Science Foundation under grant numbers DMS-1924935, DMS-1952339, DMS-1554564 (STROBE), DMS-2152762 and DMS-2208361, the Air Force Research Laboratory under grant numbers FA9550-18-0167 and MURI FA9550-18-1-0502, the Office of Naval Research under the grant number N00014-18-1-2527 and the Department of Energy under the grant number DE-SC0021142 and DE-SC0002722. LMK acknowledges support from the German National Academic Foundation (Studienstiftung des Deutschen Volkes), the European Union Horizon 2020 research and innovation programmes under the Marie Skłodowska-Curie grant agreement No. 777826 (NoMADS) and No. 691070 (CHiPS), the Cantab Capital Institute for the Mathematics of Information and Magdalene College, Cambridge (Nevile Research Fellowship). SJO was partially funded by the Office of Naval Research under the grant numbers N00014-18-20-1-2093 and N00014-20-1-2787.

Funding

Acknowledgements. This material is based on research sponsored by the National Science Foundation under grant numbers DMS-1924935, DMS-1952339, DMS-1554564 (STROBE), DMS-2152762 and DMS-2208361, the Air Force Research Laboratory under grant numbers FA9550-18-0167 and MURI FA9550-18-1-0502, the Office of Naval Research under the grant number N00014-18-1-2527 and the Department of Energy under the grant number DE-SC0021142 and DE-SC0002722. LMK acknowledges support from the German National Academic Foundation (Studienstiftung des Deutschen Volkes), the European Union Horizon 2020 research and innovation programmes under the Marie Skłodowska-Curie grant agreement No. 777826 (NoMADS) and No. 691070 (CHiPS), the Cantab Capital Institute for the Mathematics of Information and Magdalene College, Cambridge (Nevile Research Fellowship). SJO was partially funded by the Office of Naval Research under the grant numbers N00014-18-20-1-2093 and N00014-20-1-2787.

FundersFunder number
CHiPS
Cantab Capital Institute for the Mathematics of Information and Magdalene College, CambridgeN00014-18-20-1-2093, N00014-20-1-2787
National Science FoundationDMS-2208361, DMS-1952339, DMS-1924935, DMS-2152762, DMS-1554564
Office of Naval ResearchN00014-18-1-2527
U.S. Department of EnergyDE-SC0002722, DE-SC0021142
Air Force Research LaboratoryFA9550-18-0167
H2020 Marie Skłodowska-Curie Actions691070, 777826
Multidisciplinary University Research InitiativeFA9550-18-1-0502
Studienstiftung des Deutschen Volkes
Horizon 2020

    Keywords

    • cs.LG
    • cs.NA
    • math.DS
    • math.NA
    • stat.ML
    • attraction region
    • saddle points
    • gradient-based methods
    • Laplacian smoothing
    • Deterministic algorithm

    ASJC Scopus subject areas

    • Applied Mathematics

    Fingerprint

    Dive into the research topics of 'A Deterministic Gradient-Based Approach to Avoid Saddle Points'. Together they form a unique fingerprint.

    Cite this