Revisiting DP-Means: Fast Scalable Algorithms via Parallelism and Delayed Cluster Creation

Or Dinari, Oren Freifeld

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

5 Scopus citations

Abstract

DP-means, a nonparametric generalization of K-means, extends the latter to the case where the number of clusters is unknown. Unlike K-means, however, DP-means is hard to parallelize, a limitation hindering its usage in large-scale tasks. This work bridges this practicality gap by rendering the DP-means approach a viable, fast, and highly-scalable solution. First, we study the strengths and weaknesses of previous attempts to parallelize the DP-means algorithm. Next, we propose a new parallel algorithm, called PDC-DP-Means (Parallel Delayed Cluster DP-Means), based in part on delayed creation of clusters. Compared with DP-Means, PDC-DP-Means provides not only a major speedup but also performance gains. Finally, we propose two extensions of PDC-DP-Means. The first combines it with an existing method, leading to further speedups. The second extends PDC-DP-Means to a Mini-Batch setting (with an optional support for an online mode), allowing for another major speedup. We verify the utility of the proposed methods on multiple datasets. We also show that the proposed methods outperform other nonparametric methods (e.g., DBSCAN). Our highly-efficient code can be used to reproduce our experiments and is available at https://github.com/BGU-CS-VIL/pdc-dp-means.

Original languageEnglish
Title of host publicationProceedings of the 38th Conference on Uncertainty in Artificial Intelligence, UAI 2022
PublisherAssociation For Uncertainty in Artificial Intelligence (AUAI)
Pages579-588
Number of pages10
ISBN (Electronic)9781713863298
StatePublished - 1 Jan 2022
Event38th Conference on Uncertainty in Artificial Intelligence, UAI 2022 - Eindhoven, Netherlands
Duration: 1 Aug 20225 Aug 2022

Conference

Conference38th Conference on Uncertainty in Artificial Intelligence, UAI 2022
Country/TerritoryNetherlands
CityEindhoven
Period1/08/225/08/22

ASJC Scopus subject areas

  • Artificial Intelligence

Fingerprint

Dive into the research topics of 'Revisiting DP-Means: Fast Scalable Algorithms via Parallelism and Delayed Cluster Creation'. Together they form a unique fingerprint.

Cite this