TY - GEN
T1 - AMS without 4-wise independence on product domains
AU - Braverman, Vladimir
AU - Chung, Kai Min
AU - Liu, Zhenming
AU - Mitzenmacher, Michael
AU - Ostrovsky, Rafail
PY - 2010/12/1
Y1 - 2010/12/1
N2 - In their seminal work, Alon, Matias, and Szegedy introduced several sketching techniques, including showing that 4-wise independence is sufficient to obtain good approximations of the second frequency moment. In this work, we show that their sketching technique can be extended to product domains [n] k by using the product of 4-wise independent functions on [n]. Our work extends that of Indyk and McGregor, who showed the result for k = 2. Their primary motivation was the problem of identifying correlations in data streams. In their model, a stream of pairs (i, j) ∈ [n]2 arrive, giving a joint distribution (X, Y), and they find approximation algorithms for how close the joint distribution is to the product of the marginal distributions under various metrics, which naturally corresponds to how close X and Y are to being independent. By using our technique, we obtain a new result for the problem of approximating the ℓ2 distance between the joint distribution and the product of the marginal distributions for k-ary vectors, instead of just pairs, in a single pass. Our analysis gives a randomized algorithm that is a (1 ± ∈) approximation (with probability 1 - δ) that requires space logarithmic in n and m and proportional to 3k.
AB - In their seminal work, Alon, Matias, and Szegedy introduced several sketching techniques, including showing that 4-wise independence is sufficient to obtain good approximations of the second frequency moment. In this work, we show that their sketching technique can be extended to product domains [n] k by using the product of 4-wise independent functions on [n]. Our work extends that of Indyk and McGregor, who showed the result for k = 2. Their primary motivation was the problem of identifying correlations in data streams. In their model, a stream of pairs (i, j) ∈ [n]2 arrive, giving a joint distribution (X, Y), and they find approximation algorithms for how close the joint distribution is to the product of the marginal distributions under various metrics, which naturally corresponds to how close X and Y are to being independent. By using our technique, we obtain a new result for the problem of approximating the ℓ2 distance between the joint distribution and the product of the marginal distributions for k-ary vectors, instead of just pairs, in a single pass. Our analysis gives a randomized algorithm that is a (1 ± ∈) approximation (with probability 1 - δ) that requires space logarithmic in n and m and proportional to 3k.
KW - Data streams
KW - Independence
KW - Randomized algorithms
KW - Sketches
KW - Streaming algorithms
UR - https://www.scopus.com/pages/publications/84880315222
U2 - 10.4230/LIPIcs.STACS.2010.2449
DO - 10.4230/LIPIcs.STACS.2010.2449
M3 - Conference contribution
AN - SCOPUS:84880315222
SN - 9783939897163
T3 - Leibniz International Proceedings in Informatics, LIPIcs
SP - 119
EP - 130
BT - STACS 2010 - 27th International Symposium on Theoretical Aspects of Computer Science
T2 - 27th International Symposium on Theoretical Aspects of Computer Science, STACS 2010
Y2 - 4 March 2010 through 6 March 2010
ER -