Efficient Multi-GPU K-means Clustering




O'Ryan, Kian

Journal Title

Journal ISSN

Volume Title



One of the best ways of digesting ever-growing large troves of gathered data is by grouping, also known as clustering, similar data points based on their attributes. A successful clusterization is accomplished by minimizing intra-cluster and maximizing inter-cluster attribute variation. Although theoretically simple, clustering has many real-life uses in fields such as astronomy, data processing, medicine, digital marketing, biology, and computer vision. One of the standard algorithms for accomplishing clusterization is k-means, also known as Lloyd’s algorithm. Lloyd’s algorithm for computing clusters has a simple heuristic implementation, but due to the extreme size of the typical input target data, it can be a very time-consuming algorithm and is, therefore, a suitable parallelization candidate. I report my novel methods for significantly speeding up this algorithm for both GPU and CPU. This was accomplished by minimizing the amount of communication between the device and the host. My implementations utilized CUDA for the GPU code and relied on OpenMP for CPU parallelization. I achieved respectable speedup levels compared to the current state of art implementations by NVIDIA and Meta. My GPU code can execute k-means 3000 times faster than Meta’s parallel CPU and 55 times faster than NVIDIA’s GPU code.



k-means, Loyd's algorithm, Lloyd's clustering, GPU-optimization, GPU, CPU, Openmp, parallel processing, parallel computing, efficient computing, ECL, efficient computing lab, clustering, cluster analysis, unsupervised learning, heuristics


O'Ryan, K. (2023). Efficient multi-GPU K-means clustering (Unpublished thesis). Texas State University, San Marcos, Texas.


Rights Holder

Rights License

Rights URI