TY - GEN
T1 - An optimal algorithm for large frequency moments using o(n1-2/κ) bits
AU - Braverman, Vladimir
AU - Katzman, Jonathan
AU - Seidell, Charles
AU - Vorsanger, Gregory
N1 - Publisher Copyright:
© Vladimir Braverman, Jonathan Katzman, Charles Seidell, and Gregory Vorsanger.
PY - 2014/9/1
Y1 - 2014/9/1
N2 - In this paper, we provide the first optimal algorithm for the remaining open question from the seminal paper of Alon, Matias, and Szegedy: approximating large frequency moments. Given a stream D = {p1, p2, pm} of numbers from {1,n}, a frequency of i is defined as fi = |{j : pj = i}|. The κ-th frequency moment of D is defined as Fκ = ∑ni=1fiκ. We give an upper bound on the space required to find a κ-th frequency moment of O(n1-2/κ) bits that matches, up to a constant factor, the lower bound of [48] for constant ∈ and constant κ. Our algorithm makes a single pass over the stream and works for any constant1 κ > 3. It is based upon two major technical accomplishments: first, we provide an optimal algorithm for finding the heavy elements in a stream; and second, we provide a technique using Martingale Sketches which gives an optimal reduction of the large frequency moment problem to the all heavy elements problem. Additionally, this reduction works for any function g of the form ∑ni=1 g(fi) that requires sub-linear polynomial space, and it works in the more general turnstile model. As a result, we also provide a polylogarithmic improvement for frequency moments, frequency based functions, spatial data streams, and measuring independence of data sets.
AB - In this paper, we provide the first optimal algorithm for the remaining open question from the seminal paper of Alon, Matias, and Szegedy: approximating large frequency moments. Given a stream D = {p1, p2, pm} of numbers from {1,n}, a frequency of i is defined as fi = |{j : pj = i}|. The κ-th frequency moment of D is defined as Fκ = ∑ni=1fiκ. We give an upper bound on the space required to find a κ-th frequency moment of O(n1-2/κ) bits that matches, up to a constant factor, the lower bound of [48] for constant ∈ and constant κ. Our algorithm makes a single pass over the stream and works for any constant1 κ > 3. It is based upon two major technical accomplishments: first, we provide an optimal algorithm for finding the heavy elements in a stream; and second, we provide a technique using Martingale Sketches which gives an optimal reduction of the large frequency moment problem to the all heavy elements problem. Additionally, this reduction works for any function g of the form ∑ni=1 g(fi) that requires sub-linear polynomial space, and it works in the more general turnstile model. As a result, we also provide a polylogarithmic improvement for frequency moments, frequency based functions, spatial data streams, and measuring independence of data sets.
KW - Frequency Moments
KW - Heavy Hitters
KW - Randomized Algorithms
KW - Streaming Algorithms
UR - https://www.scopus.com/pages/publications/84920173493
U2 - 10.4230/LIPIcs.APPROX-RANDOM.2014.531
DO - 10.4230/LIPIcs.APPROX-RANDOM.2014.531
M3 - Conference contribution
AN - SCOPUS:84920173493
T3 - Leibniz International Proceedings in Informatics, LIPIcs
SP - 531
EP - 544
BT - Leibniz International Proceedings in Informatics, LIPIcs
A2 - Jansen, Klaus
A2 - Rolim, Jose D. P.
A2 - Devanur, Nikhil R.
A2 - Moore, Cristopher
PB - Schloss Dagstuhl- Leibniz-Zentrum fur Informatik GmbH, Dagstuhl Publishing
T2 - 17th International Workshop on Approximation Algorithms for Combinatorial Optimization Problems, APPROX 2014 and the 18th International Workshop on Randomization and Computation, RANDOM 2014
Y2 - 4 September 2014 through 6 September 2014
ER -