TY - GEN
T1 - A sampling-based approach for efficient clustering in large datasets
AU - Exarchakis, Georgios
AU - Oubari, Omar
AU - Lenz, Gregor
PY - 2022/9/22
Y1 - 2022/9/22
N2 - We propose a simple and efficient clustering method for high-dimensional data with a large number of clusters. Our algorithm achieves high-performance by evaluating distances of datapoints with a subset of the cluster centres. Our contribution is substantially more efficient than k-means as it does not require an all to all comparison of data points and clusters. We show that the optimal solutions of our approximation are the same as in the exact solution. However, our approach is considerably more efficient at extracting these clusters compared to the state-of-the-art. We compare our approximation with the exact k-means and alternative approximation approaches on a series of standardised clustering tasks. For the evaluation, we consider the algorithmic complexity, including number of operations to convergence, and the stability of the results. An efficient implementation of the algorithm is available online.
AB - We propose a simple and efficient clustering method for high-dimensional data with a large number of clusters. Our algorithm achieves high-performance by evaluating distances of datapoints with a subset of the cluster centres. Our contribution is substantially more efficient than k-means as it does not require an all to all comparison of data points and clusters. We show that the optimal solutions of our approximation are the same as in the exact solution. However, our approach is considerably more efficient at extracting these clusters compared to the state-of-the-art. We compare our approximation with the exact k-means and alternative approximation approaches on a series of standardised clustering tasks. For the evaluation, we consider the algorithmic complexity, including number of operations to convergence, and the stability of the results. An efficient implementation of the algorithm is available online.
KW - Efficient learning and inferences
KW - Machine learning
KW - Representation learning
KW - Statistical methods
UR - http://www.scopus.com/inward/record.url?scp=85141769769&partnerID=8YFLogxK
U2 - 10.1109/CVPR52688.2022.01208
DO - 10.1109/CVPR52688.2022.01208
M3 - Chapter in a published conference proceeding
AN - SCOPUS:85141769769
T3 - Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition
SP - 12393
EP - 12402
BT - Proceedings - 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2022
PB - IEEE
T2 - 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2022
Y2 - 19 June 2022 through 24 June 2022
ER -