A sampling-based approach for efficient clustering in large datasets

Georgios Exarchakis, Omar Oubari, Gregor Lenz

Research output: Chapter or section in a book/report/conference proceedingChapter in a published conference proceeding

4 Citations (SciVal)

Abstract

We propose a simple and efficient clustering method for high-dimensional data with a large number of clusters. Our algorithm achieves high-performance by evaluating distances of datapoints with a subset of the cluster centres. Our contribution is substantially more efficient than k-means as it does not require an all to all comparison of data points and clusters. We show that the optimal solutions of our approximation are the same as in the exact solution. However, our approach is considerably more efficient at extracting these clusters compared to the state-of-the-art. We compare our approximation with the exact k-means and alternative approximation approaches on a series of standardised clustering tasks. For the evaluation, we consider the algorithmic complexity, including number of operations to convergence, and the stability of the results. An efficient implementation of the algorithm is available online.

Original languageEnglish
Title of host publicationProceedings - 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2022
PublisherIEEE
Pages12393-12402
Number of pages10
ISBN (Electronic)9781665469463
DOIs
Publication statusPublished - 22 Sept 2022
Event2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2022 - New Orleans, USA United States
Duration: 19 Jun 202224 Jun 2022

Publication series

NameProceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition
Volume2022-June
ISSN (Print)1063-6919

Conference

Conference2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2022
Country/TerritoryUSA United States
CityNew Orleans
Period19/06/2224/06/22

Keywords

  • Efficient learning and inferences
  • Machine learning
  • Representation learning
  • Statistical methods

ASJC Scopus subject areas

  • Software
  • Computer Vision and Pattern Recognition

Fingerprint

Dive into the research topics of 'A sampling-based approach for efficient clustering in large datasets'. Together they form a unique fingerprint.

Cite this