Analyzing inexact hypergradients for bilevel learning

Matthias J. Ehrhardt, Lindon Roberts

Research output: Contribution to journalArticlepeer-review

34 Downloads (Pure)

Abstract

Estimating hyperparameters has been a long-standing problem in machine learning. We consider the case where the task at hand is modeled as the solution to an optimization problem. Here the exact gradient with respect to the hyperparameters cannot be feasibly computed and approximate strategies are required. We introduce a unified framework for computing hypergradients that generalizes existing methods based on the implicit function theorem and automatic differentiation/backpropagation, showing that these two seemingly disparate approaches are actually tightly connected. Our framework is extremely flexible, allowing its subproblems to be solved with any suitable method, to any degree of accuracy. We derive a priori and computable a posteriori error bounds for all our methods, and numerically show that our a posteriori bounds are usually more accurate. Our numerical results also show that, surprisingly, for efficient bilevel optimization, the choice of hypergradient algorithm is at least as important as the choice of lower-level solver.
Original languageEnglish
Pages (from-to)254-278
Number of pages25
JournalIMA Journal of Applied Mathematics
Volume89
Issue number1
Early online date30 Nov 2023
DOIs
Publication statusPublished - 31 Jan 2024

Funding

This work is supported in part by funds from EPSRC (EP/S026045/1, EP/T026693/1, EP/V026259/1) and the Leverhulme Trust (ECF-2019-478).

FundersFunder number
EPSRCEP/S026045/1, EP/T026693/1, EP/V026259/1
The Leverhulme TrustECF-2019-478

Keywords

  • automatic differentiation
  • bilevel optimization
  • hyperparameter optimization

ASJC Scopus subject areas

  • Applied Mathematics

Fingerprint

Dive into the research topics of 'Analyzing inexact hypergradients for bilevel learning'. Together they form a unique fingerprint.

Cite this