TY - GEN
T1 - Bilevel Learning with Inexact Stochastic Gradients
AU - Salehi, Mohammad Sadegh
AU - Mukherjee, Subhadip
AU - Roberts, Lindon
AU - Ehrhardt, Matthias J.
PY - 2025/5/17
Y1 - 2025/5/17
N2 - Bilevel learning has gained prominence in machine learning, inverse problems, and imaging applications, including hyperparameter optimization, learning data-adaptive regularizers, and optimizing forward operators. The large-scale nature of these problems has led to the development of inexact and computationally efficient methods. Existing adaptive methods predominantly rely on deterministic formulations, while stochastic approaches often adopt a doubly-stochastic framework with impractical variance assumptions, enforces a fixed number of lower-level iterations, and requires extensive tuning. In this work, we focus on bilevel learning with strongly convex lower-level problems and a nonconvex sum-of-functions in the upper-level. Stochasticity arises from data sampling in the upper-level which leads to inexact stochastic hypergradients. We establish their connection to state-of-the-art stochastic optimization theory for nonconvex objectives. Furthermore, we prove the convergence of inexact stochastic bilevel optimization under mild assumptions. Our empirical results highlight significant speed-ups and improved generalization in imaging tasks such as image denoising and deblurring in comparison with adaptive deterministic bilevel methods.
AB - Bilevel learning has gained prominence in machine learning, inverse problems, and imaging applications, including hyperparameter optimization, learning data-adaptive regularizers, and optimizing forward operators. The large-scale nature of these problems has led to the development of inexact and computationally efficient methods. Existing adaptive methods predominantly rely on deterministic formulations, while stochastic approaches often adopt a doubly-stochastic framework with impractical variance assumptions, enforces a fixed number of lower-level iterations, and requires extensive tuning. In this work, we focus on bilevel learning with strongly convex lower-level problems and a nonconvex sum-of-functions in the upper-level. Stochasticity arises from data sampling in the upper-level which leads to inexact stochastic hypergradients. We establish their connection to state-of-the-art stochastic optimization theory for nonconvex objectives. Furthermore, we prove the convergence of inexact stochastic bilevel optimization under mild assumptions. Our empirical results highlight significant speed-ups and improved generalization in imaging tasks such as image denoising and deblurring in comparison with adaptive deterministic bilevel methods.
KW - Bilevel Learning
KW - Learning Regularizers
KW - Machine Learning
KW - Stochastic Bilevel Optimization
KW - Variational regularization
UR - http://www.scopus.com/inward/record.url?scp=105006691052&partnerID=8YFLogxK
U2 - 10.1007/978-3-031-92366-1_27
DO - 10.1007/978-3-031-92366-1_27
M3 - Chapter in a published conference proceeding
AN - SCOPUS:105006691052
SN - 9783031923654
T3 - Lecture Notes in Computer Science
SP - 347
EP - 359
BT - Scale Space and Variational Methods in Computer Vision - 10th International Conference, SSVM 2025, Proceedings
A2 - Bubba, Tatiana A.
A2 - Gaburro, Romina
A2 - Gazzola, Silvia
A2 - Papafitsoros, Kostas
A2 - Pereyra, Marcelo
A2 - Schönlieb, Carola-Bibiane
PB - Springer
CY - Cham, Switzerland
T2 - 10th International Conference on Scale Space and Variational Methods in Computer Vision, SSVM 2025
Y2 - 18 May 2025 through 22 May 2025
ER -