Abstract
We propose a simple and efficient clustering method for high-dimensional data with a large number of clusters. Our algorithm achieves high-performance by evaluating distances of datapoints with a subset of the cluster centres. Our contribution is substantially more efficient than k-means as it does not require an all to all comparison of data points and clusters. We show that the optimal solutions of our approximation are the same as in the exact solution. However, our approach is considerably more efficient at extracting these clusters compared to the state-of-the-art. We compare our approximation with the exact k-means and alternative approximation approaches on a series of standardised clustering tasks. For the evaluation, we consider the algorithmic complexity, including number of operations to convergence, and the stability of the results. An efficient implementation of the algorithm is available online.
Original language | English |
---|---|
Title of host publication | Proceedings - 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2022 |
Publisher | IEEE |
Pages | 12393-12402 |
Number of pages | 10 |
ISBN (Electronic) | 9781665469463 |
DOIs | |
Publication status | Published - 22 Sept 2022 |
Event | 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2022 - New Orleans, USA United States Duration: 19 Jun 2022 → 24 Jun 2022 |
Publication series
Name | Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition |
---|---|
Volume | 2022-June |
ISSN (Print) | 1063-6919 |
Conference
Conference | 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2022 |
---|---|
Country/Territory | USA United States |
City | New Orleans |
Period | 19/06/22 → 24/06/22 |
Funding
We would like to thank Jörg Lücke for his valuable insight during the preparation of this manuscript. This work was partially supported by French state funds managed within the “Plan Investissements d’Avenir” by the ANR (reference ANR-10-IAHU-02).
Keywords
- Efficient learning and inferences
- Machine learning
- Representation learning
- Statistical methods
ASJC Scopus subject areas
- Software
- Computer Vision and Pattern Recognition