TY - GEN
T1 - Approximating large frequency moments with pick-and-drop sampling
AU - Braverman, Vladimir
AU - Ostrovsky, Rafail
PY - 2013/10/15
Y1 - 2013/10/15
N2 - Given data stream D = {p1,p2,...,pm} of size m of numbers from {1,..., n}, the frequency of i is defined as f i = |{j: pj = i}|. The k-th frequency moment of D is defined as Fk = ∑i=1n fi k. We consider the problem of approximating frequency moments in insertion-only streams for k ≥ 3. For any constant c we show an O(n 1-2/k log(n)log(c)(n)) upper bound on the space complexity of the problem. Here log(c)(n) is the iterative log function. Our main technical contribution is a non-uniform sampling method on matrices. We call our method a pick-and-drop sampling; it samples a heavy element (i.e., element i with frequency Ω(Fk )) with probability Ω(1/n1-2/k) and gives approximation f̃i ≥ (1 - ε) fi. In addition, the estimations never exceed the real values, that is f̃i ≤ fj for all j. For constant ε, we reduce the space complexity of finding a heavy element to O(n 1-2/k log(n)) bits. We apply our method of recursive sketches and resolve the problem with O(n1-2/k log(n)log(c)(n)) bits. We reduce the ratio between the upper and lower bounds from O(log 2(n)) to O(log(n)log(c)(n)). Thus, we provide a (roughly) quadratic improvement of the result of Andoni, Krauthgamer and Onak (FOCS 2011).
AB - Given data stream D = {p1,p2,...,pm} of size m of numbers from {1,..., n}, the frequency of i is defined as f i = |{j: pj = i}|. The k-th frequency moment of D is defined as Fk = ∑i=1n fi k. We consider the problem of approximating frequency moments in insertion-only streams for k ≥ 3. For any constant c we show an O(n 1-2/k log(n)log(c)(n)) upper bound on the space complexity of the problem. Here log(c)(n) is the iterative log function. Our main technical contribution is a non-uniform sampling method on matrices. We call our method a pick-and-drop sampling; it samples a heavy element (i.e., element i with frequency Ω(Fk )) with probability Ω(1/n1-2/k) and gives approximation f̃i ≥ (1 - ε) fi. In addition, the estimations never exceed the real values, that is f̃i ≤ fj for all j. For constant ε, we reduce the space complexity of finding a heavy element to O(n 1-2/k log(n)) bits. We apply our method of recursive sketches and resolve the problem with O(n1-2/k log(n)log(c)(n)) bits. We reduce the ratio between the upper and lower bounds from O(log 2(n)) to O(log(n)log(c)(n)). Thus, we provide a (roughly) quadratic improvement of the result of Andoni, Krauthgamer and Onak (FOCS 2011).
KW - Data streams
KW - frequency moments
KW - sampling
UR - https://www.scopus.com/pages/publications/84885206425
U2 - 10.1007/978-3-642-40328-6_4
DO - 10.1007/978-3-642-40328-6_4
M3 - Conference contribution
AN - SCOPUS:84885206425
SN - 9783642403279
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 42
EP - 57
BT - Approximation, Randomization, and Combinatorial Optimization
T2 - 16th International Workshop on Approximation Algorithms for Combinatorial Optimization Problems, APPROX 2013 and the 17th International Workshop on Randomization and Computation, RANDOM 2013
Y2 - 21 August 2013 through 23 August 2013
ER -