TY - GEN

T1 - On computing centroids according to the p-norms of hamming distance vectors

AU - Chen, Jiehua

AU - Hermelin, Danny

AU - Sorge, Manuel

N1 - Publisher Copyright:
© Jiehua Chen, Danny Hermelin, and Manuel Sorge.

PY - 2019/9/1

Y1 - 2019/9/1

N2 - In this paper we consider the p-Norm Hamming Centroid problem which asks to determine whether some given strings have a centroid with a bound on the p-norm of its Hamming distances to the strings. Specifically, given a set S of strings and a real k, we consider the problem of determining whether there exists a string s∗ with (Ʃs∈S dp(s∗, s))1/p ≤ k, where d(, ) denotes the Hamming distance metric. This problem has important applications in data clustering and multi-winner committee elections, and is a generalization of the well-known polynomial-time solvable Consensus String (p = 1) problem, as well as the NP-hard Closest String (p = ∞) problem. Our main result shows that the problem is NP-hard for all fixed rational p > 1, closing the gap for all rational values of p between 1 and ∞. Under standard complexity assumptions the p reduction also implies that the problem has no 2°(n+m)-time or 2°(k p/(p+1))-time algorithm, where m denotes the number of input strings and n denotes the length of each string, for any fixed p > 1. The first bound matches a straightforward brute-force algorithm. The second bound is tight in the sense that for each fixed ε > 0, we provide a 2k(p/p+1) +ε-time algorithm. In the last part of the paper, we complement our hardness result by presenting a fixed-parameter algorithm and a factor-2 approximation algorithm for the problem.

AB - In this paper we consider the p-Norm Hamming Centroid problem which asks to determine whether some given strings have a centroid with a bound on the p-norm of its Hamming distances to the strings. Specifically, given a set S of strings and a real k, we consider the problem of determining whether there exists a string s∗ with (Ʃs∈S dp(s∗, s))1/p ≤ k, where d(, ) denotes the Hamming distance metric. This problem has important applications in data clustering and multi-winner committee elections, and is a generalization of the well-known polynomial-time solvable Consensus String (p = 1) problem, as well as the NP-hard Closest String (p = ∞) problem. Our main result shows that the problem is NP-hard for all fixed rational p > 1, closing the gap for all rational values of p between 1 and ∞. Under standard complexity assumptions the p reduction also implies that the problem has no 2°(n+m)-time or 2°(k p/(p+1))-time algorithm, where m denotes the number of input strings and n denotes the length of each string, for any fixed p > 1. The first bound matches a straightforward brute-force algorithm. The second bound is tight in the sense that for each fixed ε > 0, we provide a 2k(p/p+1) +ε-time algorithm. In the last part of the paper, we complement our hardness result by presenting a fixed-parameter algorithm and a factor-2 approximation algorithm for the problem.

KW - Clustering

KW - Hamming distance

KW - Multiwinner election

KW - Strings

UR - http://www.scopus.com/inward/record.url?scp=85074848491&partnerID=8YFLogxK

U2 - 10.4230/LIPIcs.ESA.2019.28

DO - 10.4230/LIPIcs.ESA.2019.28

M3 - Conference contribution

AN - SCOPUS:85074848491

T3 - Leibniz International Proceedings in Informatics, LIPIcs

BT - 27th Annual European Symposium on Algorithms, ESA 2019

A2 - Bender, Michael A.

A2 - Svensson, Ola

A2 - Herman, Grzegorz

PB - Schloss Dagstuhl- Leibniz-Zentrum fur Informatik GmbH, Dagstuhl Publishing

T2 - 27th Annual European Symposium on Algorithms, ESA 2019

Y2 - 9 September 2019 through 11 September 2019

ER -