Abstract
In classification problems with large output spaces (up to millions of labels), the last layer can require an enormous amount of memory. Using sparse connectivity would drastically reduce the memory requirements, but as we show below, applied naïvely it can result in much diminished predictive performance. Fortunately, we found that this can be mitigated by introducing an intermediate layer of intermediate size. We further demonstrate that one can constrain the connectivity of the sparse layer to be of constant fan-in, in the sense that each output neuron will have the exact same number of incoming connections, which allows for more efficient implementations, especially on GPU hardware. The CUDA implementation of our approach is provided at https://github.com/xmc-aalto/ecml23-sparse.
Original language | English |
---|---|
Title of host publication | Machine Learning and Knowledge Discovery in Databases |
Subtitle of host publication | Research Track - European Conference, ECML PKDD 2023, Proceedings |
Editors | Danai Koutra, Claudia Plant, Manuel Gomez Rodriguez, Elena Baralis, Francesco Bonchi |
Publisher | Springer Science and Business Media Deutschland GmbH |
Pages | 689-704 |
Number of pages | 16 |
ISBN (Print) | 9783031434174 |
DOIs | |
Publication status | Published - 17 Sept 2023 |
Event | European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, ECML PKDD 2023 - Turin, Italy Duration: 18 Sept 2023 → 22 Sept 2023 |
Publication series
Name | Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) |
---|---|
Volume | 14171 LNAI |
ISSN (Print) | 0302-9743 |
ISSN (Electronic) | 1611-3349 |
Conference
Conference | European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, ECML PKDD 2023 |
---|---|
Country/Territory | Italy |
City | Turin |
Period | 18/09/23 → 22/09/23 |
Funding
We acknowledge the support of computational resources provided by the Aalto Science-IT project, and CSC IT Center for Science, Finland. This work is funded in part by the Academy of Finland projects 347707 and 348215.
ASJC Scopus subject areas
- Theoretical Computer Science
- General Computer Science