Re-ranking approach to classification in large-scale power-law distributed category systems

Rohit Babbar, Ioannis Partalas, Eric Gaussier, Massih Reza Amini

Research output: Chapter or section in a book/report/conference proceedingChapter in a published conference proceeding

6 Citations (SciVal)

Abstract

For large-scale category systems, such as Directory Mozilla, which consist of tens of thousand categories, it has been empirically verified in earlier studies that the distribution of documents among categories can be modeled as a power- law distribution. It implies that a significant fraction of categories, referred to as rare categories, have very few documents assigned to them. This characteristic of the data makes it harder for learning algorithms to learn effective decision boundaries which can correctly detect such categories in the test set. In this work, we exploit the distribution of documents among categories to (i) derive an upper bound on the accuracy of any classifier, and (ii) propose a ranking-based algorithm which aims to maximize this upper bound. The empirical evaluation on publicly available large-scale datasets demonstrate that the proposed method not only achieves higher accuracy but also much higher coverage of rare categories as compared to state-of-the-art methods.

Original languageEnglish
Title of host publicationSIGIR 2014 - Proceedings of the 37th International ACM SIGIR Conference on Research and Development in Information Retrieval
PublisherAssociation for Computing Machinery
Pages1059-1062
Number of pages4
ISBN (Print)9781450322591
DOIs
Publication statusPublished - 3 Jul 2014
Event37th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2014 - Gold Coast, QLD, Australia
Duration: 6 Jul 201411 Jul 2014

Publication series

NameSIGIR 2014 - Proceedings of the 37th International ACM SIGIR Conference on Research and Development in Information Retrieval

Conference

Conference37th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2014
Country/TerritoryAustralia
CityGold Coast, QLD
Period6/07/1411/07/14

Keywords

  • Large-scale classification
  • Power-law distribution

ASJC Scopus subject areas

  • Computer Graphics and Computer-Aided Design
  • Information Systems

Fingerprint

Dive into the research topics of 'Re-ranking approach to classification in large-scale power-law distributed category systems'. Together they form a unique fingerprint.

Cite this