TY - GEN
T1 - The Power of Uniform Sampling for Coresets
AU - Braverman, Vladimir
AU - Cohen-Addad, Vincent
AU - Jiang, H. C.Shaofeng
AU - Krauthgamer, Robert
AU - Schwiegelshohn, Chris
AU - Toftrup, Mads Bech
AU - Wu, Xuan
N1 - Publisher Copyright:
© 2022 IEEE.
PY - 2022/1/1
Y1 - 2022/1/1
N2 - Motivated by practical generalizations of the classic k-median and k-means objectives, such as clustering with size constraints, fair clustering, and Wasserstein barycenter, we introduce a meta-theorem for designing coresets for constrained-clustering problems. The meta-theorem reduces the task of coreset construction to one on a bounded number of ring instances with a much-relaxed additive error. This reduction enables us to construct coresets using uniform sampling, in contrast to the widely-used importance sampling, and consequently we can easily handle constrained objectives. Notably and perhaps surprisingly, this simpler sampling scheme can yield coresets whose size is independent of n, the number of input points. Our technique yields smaller coresets, and sometimes the first coresets, for a large number of constrained clustering problems, including capacitated clustering, fair clustering, Euclidean Wasserstein barycenter, clustering in minor-excluded graph, and polygon clustering under Fréchet and Hausdorff distance. Finally, our technique yields also smaller coresets for 1-median in low-dimensional Euclidean spaces, specifically of size O(?-1.5) in R2 and O(?-1.6) in R3.
AB - Motivated by practical generalizations of the classic k-median and k-means objectives, such as clustering with size constraints, fair clustering, and Wasserstein barycenter, we introduce a meta-theorem for designing coresets for constrained-clustering problems. The meta-theorem reduces the task of coreset construction to one on a bounded number of ring instances with a much-relaxed additive error. This reduction enables us to construct coresets using uniform sampling, in contrast to the widely-used importance sampling, and consequently we can easily handle constrained objectives. Notably and perhaps surprisingly, this simpler sampling scheme can yield coresets whose size is independent of n, the number of input points. Our technique yields smaller coresets, and sometimes the first coresets, for a large number of constrained clustering problems, including capacitated clustering, fair clustering, Euclidean Wasserstein barycenter, clustering in minor-excluded graph, and polygon clustering under Fréchet and Hausdorff distance. Finally, our technique yields also smaller coresets for 1-median in low-dimensional Euclidean spaces, specifically of size O(?-1.5) in R2 and O(?-1.6) in R3.
KW - Wasserstein barycenter
KW - capacitated clustering
KW - clustering
KW - coresets
KW - fair clustering
UR - https://www.scopus.com/pages/publications/85143403763
U2 - 10.1109/FOCS54457.2022.00051
DO - 10.1109/FOCS54457.2022.00051
M3 - Conference contribution
AN - SCOPUS:85143403763
T3 - Proceedings - Annual IEEE Symposium on Foundations of Computer Science, FOCS
SP - 462
EP - 473
BT - Proceedings - 2022 IEEE 63rd Annual Symposium on Foundations of Computer Science, FOCS 2022
PB - Institute of Electrical and Electronics Engineers
T2 - 63rd IEEE Annual Symposium on Foundations of Computer Science, FOCS 2022
Y2 - 31 October 2022 through 3 November 2022
ER -