Scalable and flexible clustering of grouped data via parallel and distributed sampling in versatile hierarchical dirichlet processes

Or Dinari, Oren Freifeld

Research output: Contribution to conferencePaperpeer-review

Abstract

Adaptive clustering of grouped data is often done via the Hierarchical Dirichlet Process Mixture Model (HDPMM). That approach, however, is limited in its flexibility and usually does not scale well. As a remedy, we propose another, but closely related, hierarchical Bayesian nonparametric framework. Our main contributions are as follows. 1) a new model, called the Versatile HDPMM (vHDPMM), with two possible settings: full and reduced. While the latter is akin to the HDPMM's setting, the former supports not only global features (as HDPMM does) but also local ones. 2) An effective mechanism for detecting global features. 3) A new sampler that addresses the challenges posed by the vHDPMM and, in the reduced setting, scales better than HDPMM samplers. 4) An efficient, distributed, and easily-modifiable implementation that offers more flexibility (even in the reduced setting) than publicly-available HDPMM implementations. Finally, we show the utility of the approach in applications such as image cosegmentation, visual topic modeling, and clustering with missing data.

Original languageEnglish
Pages231-240
Number of pages10
StatePublished - 1 Jan 2020
Event36th Conference on Uncertainty in Artificial Intelligence, UAI 2020 - Virtual, Online
Duration: 3 Aug 20206 Aug 2020

Conference

Conference36th Conference on Uncertainty in Artificial Intelligence, UAI 2020
CityVirtual, Online
Period3/08/206/08/20

ASJC Scopus subject areas

  • Artificial Intelligence

Fingerprint

Dive into the research topics of 'Scalable and flexible clustering of grouped data via parallel and distributed sampling in versatile hierarchical dirichlet processes'. Together they form a unique fingerprint.

Cite this