Meta-classifier free negative sampling for extreme multilabel classification

Mohammadreza Qaraei, Rohit Babbar

Research output: Contribution to journalArticlepeer-review

1 Citation (SciVal)

Abstract

Negative sampling is a common approach for making the training of deep models in classification problems with very large output spaces, known as extreme multilabel classification (XMC) problems, tractable. Negative sampling methods aim to find per instance negative labels with higher scores, known as hard negatives, and limit the computations of the negative part of the loss to these labels. Two well-known methods for negative sampling in XMC models are meta-classifier-based and Maximum Inner product Search (MIPS)-based adaptive methods. Owing to their good prediction performance, methods which employ a meta classifier are more common in contemporary XMC research. On the flip side, they need to train and store the meta classifier (apart from the extreme classifier), which can involve millions of additional parameters. In this paper, we focus on the MIPS-based methods for negative sampling. We highlight two issues which may prevent deep models trained by these methods to undergo stable training. First, we argue that using hard negatives excessively from the beginning of training leads to unstable gradient. Second, we show that when all the negative labels in a MIPS-based method are restricted to only those determined by MIPS, training is sensitive to the length of intervals for pre-processing the weights in the MIPS method. To mitigate the aforementioned issues, we propose to limit the labels selected by MIPS to only a few and sample the rest of the needed labels from a uniform distribution. We show that our proposed MIPS-based negative sampling can reach the performance of LightXML, a transformer-based model trained by a meta classifier, while there is no need to train and store any additional classifier. The code for our experiments is available at https://github.com/xmc-aalto/mips-negative-sampling .

Original languageEnglish
Pages (from-to)675-697
Number of pages23
JournalMachine Learning
Volume113
Issue number2
Early online date20 Nov 2023
DOIs
Publication statusPublished - 20 Nov 2023

Bibliographical note

Funding Information:
Open Access funding provided by Aalto University. Open Access funding provided by Aalto University. The authors would like to acknowledge research funding from the Academy of Finland projects (347707 and 348215).

Funding

Open Access funding provided by Aalto University. Open Access funding provided by Aalto University. The authors would like to acknowledge research funding from the Academy of Finland projects (347707 and 348215).

FundersFunder number
Academy of Finland348215, 347707
Aalto-Yliopisto
Kemian tekniikan korkeakoulu, Aalto-yliopisto
Sähkötekniikan Korkeakoulu, Aalto-yliopisto
Insinööritieteiden Korkeakoulu, Aalto-yliopisto

    Keywords

    • Deep neural networks
    • Extreme classification
    • Hard negative mining
    • Maximum inner product search
    • Negative sampling

    ASJC Scopus subject areas

    • Software
    • Artificial Intelligence

    Fingerprint

    Dive into the research topics of 'Meta-classifier free negative sampling for extreme multilabel classification'. Together they form a unique fingerprint.

    Cite this