ELMO: Efficiency via Low-precision and Peak Memory Optimization in Large Output Spaces

Jinbin Zhang, Nasib Ullah, Erik Schultheis, Rohit Babbar

Research output: Contribution to journalConference articlepeer-review

Abstract

Large output spaces, also referred to as Extreme multilabel classification (XMC), is a setting that arises, e.g., in large-scale tagging and productto-product recommendation, and is characterized by the number of labels ranging from hundreds of thousands to millions. This means that the linear classification head, usually only a tiny fraction of the overall model, turns into the main driver for compute and memory demand. Current state-of-the-art XMC methods predominantly rely on FP16-FP32 mixed-precision training, which we show can be unstable, and inefficient in terms of memory usage and computational overhead. Meanwhile, existing low-precision methods typically retain higher precision for the classification layer. In this work, we propose ELMO, a pure low-precision training framework for XMC models using BFloat16 and Float8 data types. By leveraging Kahan summation and stochastic rounding, we demonstrate that XMC models can be effectively trained entirely in Float8, without relying on single-precision master weights or tensor scaling. Low-precision training, combined with our proposed memory optimizations—gradient fusion and chunking—enables significant reductions in GPU memory usage. For example, we train a 3-million-label XMC model with only 6.6 GiB of GPU memory, compared to the 39.7 GiB required by the optimized SOTA method, Renee (Jain et al., 2023) without compromising accuracy. Code available at https://github.com/xmc-aalto/elmo.

Original languageEnglish
Pages (from-to)76159-76174
Number of pages16
JournalProceedings of Machine Learning Research
Volume267
Early online date1 May 2025
Publication statusPublished - 19 Jul 2025
Event42nd International Conference on Machine Learning, ICML 2025 - Vancouver, Canada
Duration: 13 Jul 202519 Jul 2025

Funding

We acknowledge the support of the Academy of Finland (Research Council of Finland) via grants 347707 and 348215 and the support of computational resources provided by the Aalto Science-IT project, and CSC IT Center for Science, Finland

ASJC Scopus subject areas

  • Software
  • Control and Systems Engineering
  • Statistics and Probability
  • Artificial Intelligence

Fingerprint

Dive into the research topics of 'ELMO: Efficiency via Low-precision and Peak Memory Optimization in Large Output Spaces'. Together they form a unique fingerprint.

Cite this