TY - UNPB
T1 - A Constant Approximation Algorithm for Sequential No-Substitution k-Median Clustering under a Random Arrival Order
AU - Hess, Tom
AU - Moshkovitz, Michal
AU - Sabato, Sivan
PY - 2021/6/6
Y1 - 2021/6/6
N2 - We study k-median clustering under the sequential no-substitution
setting. In this setting, a data stream is sequentially observed, and
some of the points are selected by the algorithm as cluster centers.
However, a point can be selected as a center only immediately after it
is observed, before observing the next point. In addition, a selected
center cannot be substituted later. We give a new algorithm for this
setting that obtains a constant approximation factor on the optimal risk
under a random arrival order. This is the first such algorithm that
holds without any assumptions on the input data and selects a
non-trivial number of centers. The number of selected centers is
quasi-linear in k. Our algorithm and analysis are based on a careful
risk estimation that avoids outliers, a new concept of a linear bin
division, and repeated calculations using an offline clustering
algorithm.
AB - We study k-median clustering under the sequential no-substitution
setting. In this setting, a data stream is sequentially observed, and
some of the points are selected by the algorithm as cluster centers.
However, a point can be selected as a center only immediately after it
is observed, before observing the next point. In addition, a selected
center cannot be substituted later. We give a new algorithm for this
setting that obtains a constant approximation factor on the optimal risk
under a random arrival order. This is the first such algorithm that
holds without any assumptions on the input data and selects a
non-trivial number of centers. The number of selected centers is
quasi-linear in k. Our algorithm and analysis are based on a careful
risk estimation that avoids outliers, a new concept of a linear bin
division, and repeated calculations using an offline clustering
algorithm.
KW - Computer Science - Machine Learning
KW - Statistics - Machine Learning
U2 - 10.48550/arXiv.2102.04050
DO - 10.48550/arXiv.2102.04050
M3 - Preprint
BT - A Constant Approximation Algorithm for Sequential No-Substitution k-Median Clustering under a Random Arrival Order
ER -