Abstract
For large-scale multi-class classification problems, consisting of tens of thousand target categories, recent works have emphasized the need to store billions of parameters. For instance, the classical /2-norm regularization employed by a state-of-the-art method results in the model size of 17GB for a training set whose size is only 129MB. To the contrary, by using a mixed-norm regularization approach, we show that around 99.5% of the stored parameters is dispensable noise. Using this strategy, we can extract the information relevant for classification, which is constituted in remaining 0.5% of the parameters, and hence demonstrate drastic reduction in model sizes. Furthermore, the proposed method leads to improvement in generalization performance compared to state-of-the-art methods, especially for under-represented categories. Lastly, our method enjoys easy parallelization, and scales well to tens of thousand target categories.
Original language | English |
---|---|
Title of host publication | 16th SIAM International Conference on Data Mining 2016, SDM 2016 |
Editors | Sanjay Chawla Venkatasubramanian, Wagner Meira |
Publisher | Society for Industrial and Applied Mathematics Publications |
Pages | 234-242 |
Number of pages | 9 |
ISBN (Electronic) | 9781611974348 |
DOIs | |
Publication status | E-pub ahead of print - 11 Aug 2016 |
Event | 16th SIAM International Conference on Data Mining 2016, SDM 2016 - Miami, USA United States Duration: 5 May 2016 → 7 May 2016 |
Publication series
Name | 16th SIAM International Conference on Data Mining 2016, SDM 2016 |
---|
Conference
Conference | 16th SIAM International Conference on Data Mining 2016, SDM 2016 |
---|---|
Country/Territory | USA United States |
City | Miami |
Period | 5/05/16 → 7/05/16 |
Bibliographical note
Publisher Copyright:Copyright © by SIAM.
ASJC Scopus subject areas
- Computer Science Applications
- Software