On Missing Labels, Long-tails and Propensities in Extreme Multi-label Classification

Erik Schultheis, Marek Wydmuch, Rohit Babbar, Krzysztof Dembczynski

Research output: Chapter or section in a book/report/conference proceedingChapter in a published conference proceeding

18 Citations (SciVal)

Abstract

The propensity model introduced by Jain et al has become a standard approach for dealing with missing and long-tail labels in extreme multi-label classification (XMLC). In this paper, we critically revise this approach showing that despite its theoretical soundness, its application in contemporary XMLC works is debatable. We exhaustively discuss the flaws of the propensity-based approach, and present several recipes, some of them related to solutions used in search engines and recommender systems, that we believe constitute promising alternatives to be followed in XMLC.

Original languageEnglish
Title of host publicationKDD 2022 - Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining
PublisherAssociation for Computing Machinery
Pages1547-1557
Number of pages11
ISBN (Electronic)9781450393850
DOIs
Publication statusPublished - 14 Aug 2022
Event28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, KDD 2022 - Washington, USA United States
Duration: 14 Aug 202218 Aug 2022

Publication series

NameProceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

Conference

Conference28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, KDD 2022
Country/TerritoryUSA United States
CityWashington
Period14/08/2218/08/22

Keywords

  • extreme classification
  • long-tail labels
  • missing labels
  • multi-label classification
  • propensity model
  • recommendation

ASJC Scopus subject areas

  • Software
  • Information Systems

Fingerprint

Dive into the research topics of 'On Missing Labels, Long-tails and Propensities in Extreme Multi-label Classification'. Together they form a unique fingerprint.

Cite this