TY - CONF
T1 - Scalable and flexible clustering of grouped data via parallel and distributed sampling in versatile hierarchical dirichlet processes
AU - Dinari, Or
AU - Freifeld, Oren
N1 - Funding Information:
Acknowledgements. This work was partially funded by the Lynn and William Frankel Center for Computer Science at BGU. Or Dinari was also funded in part by the Jabotinsky Scholarship from Israel’s Ministry of Technology and Science, and by BGU’s Hi-Tech Scholarship.
Publisher Copyright:
© Proceedings of the 36th Conference on Uncertainty in Artificial Intelligence, UAI 2020. All rights reserved.
PY - 2020/1/1
Y1 - 2020/1/1
N2 - Adaptive clustering of grouped data is often done via the Hierarchical Dirichlet Process Mixture Model (HDPMM). That approach, however, is limited in its flexibility and usually does not scale well. As a remedy, we propose another, but closely related, hierarchical Bayesian nonparametric framework. Our main contributions are as follows. 1) a new model, called the Versatile HDPMM (vHDPMM), with two possible settings: full and reduced. While the latter is akin to the HDPMM's setting, the former supports not only global features (as HDPMM does) but also local ones. 2) An effective mechanism for detecting global features. 3) A new sampler that addresses the challenges posed by the vHDPMM and, in the reduced setting, scales better than HDPMM samplers. 4) An efficient, distributed, and easily-modifiable implementation that offers more flexibility (even in the reduced setting) than publicly-available HDPMM implementations. Finally, we show the utility of the approach in applications such as image cosegmentation, visual topic modeling, and clustering with missing data.
AB - Adaptive clustering of grouped data is often done via the Hierarchical Dirichlet Process Mixture Model (HDPMM). That approach, however, is limited in its flexibility and usually does not scale well. As a remedy, we propose another, but closely related, hierarchical Bayesian nonparametric framework. Our main contributions are as follows. 1) a new model, called the Versatile HDPMM (vHDPMM), with two possible settings: full and reduced. While the latter is akin to the HDPMM's setting, the former supports not only global features (as HDPMM does) but also local ones. 2) An effective mechanism for detecting global features. 3) A new sampler that addresses the challenges posed by the vHDPMM and, in the reduced setting, scales better than HDPMM samplers. 4) An efficient, distributed, and easily-modifiable implementation that offers more flexibility (even in the reduced setting) than publicly-available HDPMM implementations. Finally, we show the utility of the approach in applications such as image cosegmentation, visual topic modeling, and clustering with missing data.
UR - http://www.scopus.com/inward/record.url?scp=85101632947&partnerID=8YFLogxK
M3 - Paper
AN - SCOPUS:85101632947
SP - 231
EP - 240
T2 - 36th Conference on Uncertainty in Artificial Intelligence, UAI 2020
Y2 - 3 August 2020 through 6 August 2020
ER -