A Deterministic Gradient-Based Approach to Avoid Saddle Points

Lisa Maria Kreusser, Stanley J. Osher, Bao Wang

Research output: Contribution to journalArticlepeer-review

30 Downloads (Pure)

Abstract

Loss functions with a large number of saddle points are one of the major obstacles for training modern machine learning models efficiently. First-order methods such as gradient descent are usually the methods of choice for training machine learning models. However, these methods converge to saddle points for certain choices of initial guesses. In this paper, we propose a modification of the recently proposed Laplacian smoothing gradient descent [Osher et al., arXiv:1806.06317], called modified Laplacian smoothing gradient descent (mLSGD), and demonstrate its potential to avoid saddle points without sacrificing the convergence rate. Our analysis is based on the attraction region, formed by all starting points for which the considered numerical scheme converges to a saddle point. We investigate the attraction region's dimension both analytically and numerically. For a canonical class of quadratic functions, we show that the dimension of the attraction region for mLSGD is floor((n-1)/2), and hence it is significantly smaller than that of the gradient descent whose dimension is n-1.
Original languageEnglish
Pages (from-to)738-757
Number of pages20
JournalEuropean Journal of Applied Mathematics
Volume34
Issue number4
Early online date9 Nov 2022
DOIs
Publication statusPublished - 31 Aug 2023

Bibliographical note

This material is based on research sponsored by the National Science Foundation under grant numbers DMS-1924935, DMS-1952339, DMS-1554564 (STROBE), DMS-2152762 and DMS-2208361, the Air Force Research Laboratory under grant numbers FA9550-18-0167 and MURI FA9550-18-1-0502, the Office of Naval Research under the grant number N00014-18-1-2527 and the Department of Energy under the grant number DE-SC0021142 and DE-SC0002722. LMK acknowledges support from the German National Academic Foundation (Studienstiftung des Deutschen Volkes), the European Union Horizon 2020 research and innovation programmes under the Marie Skłodowska-Curie grant agreement No. 777826 (NoMADS) and No. 691070 (CHiPS), the Cantab Capital Institute for the Mathematics of Information and Magdalene College, Cambridge (Nevile Research Fellowship). SJO was partially funded by the Office of Naval Research under the grant numbers N00014-18-20-1-2093 and N00014-20-1-2787.

Keywords

  • cs.LG
  • cs.NA
  • math.DS
  • math.NA
  • stat.ML

Fingerprint

Dive into the research topics of 'A Deterministic Gradient-Based Approach to Avoid Saddle Points'. Together they form a unique fingerprint.

Cite this