Abstract
Many validity criteria have been proposed over the years in order to validate clustering of unlabeled data sets. In this research we compared the performance of several known validity criteria to several new validity criteria for a mixture of normally distributed data. The main group of the new criteria includes modifications of the Gath and Geva partition and average density criteria while one new criterion is based on the generalized Neyman-Pearson (GNP) test for normality. The comparison was performed by using simulated Gaussian data sets, which were built from 1 to 5 clusters in 1-4 dimensions with a variety of clusters means and variances. The clustering process was implemented by the unsupervised optimal fuzzy clustering (UOFC) algorithm that combines the fuzzy c-means (FCM) algorithm and a fuzzy modification of the maximum likelihood estimation algorithm (FMLE). We conclude that in general, there is no single validity criterion that consistently performed much better than the others under all conditions, but nevertheless we can state clearly that some of the new validity criteria showed advantages in validating most of the simulated Gaussian data sets.
Original language | English |
---|---|
Pages (from-to) | 511-529 |
Number of pages | 19 |
Journal | Pattern Recognition Letters |
Volume | 21 |
Issue number | 6-7 |
DOIs | |
State | Published - 1 Jan 2000 |
Keywords
- Cluster validity
- Entropy maximization
- Generalized Neyman-Pearson (GNP) criterion
- Hypothesis testing
- Mixture of normal distributed data
- Unsupervised clustering
ASJC Scopus subject areas
- Software
- Signal Processing
- Computer Vision and Pattern Recognition
- Artificial Intelligence