Abstract
We investigate the extent to which cluster- ing algorithms are robust to the addition of a small, potentially adversarial, set of points. Our analysis reveals radical differences in the robustness of popular clustering methods. k-means and several related techniques are robust when data is clusterable, and we provide a quantitative analysis capturing the precise relationship between clusterabil- ity and robustness. In contrast, com- mon linkage-based algorithms and several standard objective-function-based clustering methods can be highly sensitive to the addi- tion of a small set of points even when the data is highly clusterable. We call such sets of points oligarchies. Lastly, we show that the behavior with re- spect to oligarchies of the popular Lloyd's method changes radically with the initializa- tion technique.
Original language | English |
---|---|
Pages (from-to) | 66-74 |
Number of pages | 9 |
Journal | Journal of Machine Learning Research |
Volume | 31 |
State | Published - 1 Jan 2013 |
Externally published | Yes |
Event | 16th International Conference on Artificial Intelligence and Statistics, AISTATS 2013 - Scottsdale, United States Duration: 29 Apr 2013 → 1 May 2013 |
ASJC Scopus subject areas
- Control and Systems Engineering
- Software
- Statistics and Probability
- Artificial Intelligence