Gaussian mixture models with equivalence constraints

Noam Shental, Aharon Bar-Hillel, Tomer Hertz, Daphna Weinshall

Research output: Chapter in Book/Report/Conference proceedingChapterpeer-review

6 Scopus citations

Abstract

Abstract Gaussian Mixture Models (GMMs) have been widely used to cluster data in an unsupervised manner via the Expectation Maximization (EM) algorithm. In this chapter we suggest a semi-supervised EM algorithm that incorporates equivalence constraints into a GMM. Equivalence constraints provide information about pairs of data points, indicating whether the points arise from the same source (a must-link constraint) or from different sources (a cannot-link constraint). These constraints allow the EM algorithm to converge to solutions that better reflect the class structure of the data. Moreover, in some learning scenarios equivalence constraints can be gathered automatically while they are a natural form of supervision in others. We present a closed form EM algorithm for handling must-link constraints, and a generalized EM algorithm using a Markov network for incorporating cannotlink constraints. Using publicly available data sets, we demonstrate that incorporating equivalence constraints leads to a considerable improvement in clustering performance. Our GMM-based clustering algorithm significantly outperforms two other available clustering methods that use equivalence con-Mixture models are a powerful tool for probabilistic modelling of data, which have been widely used in various research areas such as pattern recognition, machine learning, computer vision, and signal processing [13, 14, 18]. Such models provide a principled probabilistic approach to cluster data in an unsupervised manner [24, 25, 30, 31]. In addition, their ability to represent complex density functions has also made them an excellent choice in density estimation problems [20, 23].

Original languageEnglish
Title of host publicationConstrained Clustering
Subtitle of host publicationAdvances in Algorithms, Theory, and Applications
PublisherCRC Press
Pages33-58
Number of pages26
ISBN (Electronic)9781584889977
ISBN (Print)9781584889960
StatePublished - 1 Jan 2008

ASJC Scopus subject areas

  • General Computer Science
  • General Economics, Econometrics and Finance
  • General Business, Management and Accounting

Fingerprint

Dive into the research topics of 'Gaussian mixture models with equivalence constraints'. Together they form a unique fingerprint.

Cite this