Abstract
Sparsity-promoting regularizers are widely used to impose low-complexity structure (e.g., (Formula presented.) -norm for sparsity) to the regression coefficients of supervised learning. In the realm of deterministic optimization, the sequence generated by iterative algorithms (such as proximal gradient descent) exhibit “finite activity identification” property, that is, they can identify the low-complexity structure of the solution in a finite number of iterations. However, many online algorithms (such as proximal stochastic gradient descent) do not have this property owing to the vanishing step-size and nonvanishing variance. In this article, by combining with a screening rule, we show how to eliminate useless features of the iterates generated by online algorithms, and thereby enforce finite sparsity identification. One advantage of our scheme is that when combined with any convergent online algorithm, sparsity properties imposed by the regularizer can be exploited to improve computational efficiency. Numerically, significant acceleration can be obtained.
Original language | English |
---|---|
Pages (from-to) | 275-293 |
Journal | Journal of Computational and Graphical Statistics |
Volume | 32 |
Issue number | 1 |
Early online date | 13 Jul 2022 |
DOIs | |
Publication status | Published - 31 Dec 2023 |
Bibliographical note
Not acknowledging any UK funding.Keywords
- Finite activity identification
- Nonsmooth regularization
- Screening rules
- Sparsity promoting regularization
- Stochastic gradient descent
ASJC Scopus subject areas
- Statistics and Probability
- Statistics, Probability and Uncertainty
- Discrete Mathematics and Combinatorics