TY - GEN
T1 - The White-Box Adversarial Data Stream Model
AU - Ajtai, Miklós
AU - Braverman, Vladimir
AU - Jayram, T. S.
AU - Silwal, Sandeep
AU - Sun, Alec
AU - Woodruff, David P.
AU - Zhou, Samson
N1 - Publisher Copyright:
© 2022 Owner/Author.
PY - 2022/6/13
Y1 - 2022/6/13
N2 - There has been a flurry of recent literature studying streaming algorithms for which the input stream is chosen adaptively by a black-box adversary who observes the output of the streaming algorithm at each time step. However, these algorithms fail when the adversary has access to the internal state of the algorithm, rather than just the output of the algorithm. We study streaming algorithms in the white-box adversarial model, where the stream is chosen adaptively by an adversary who observes the entire internal state of the algorithm at each time step. We show that nontrivial algorithms are still possible. We first give a randomized algorithm for the L1-heavy hitters problem that outperforms the optimal deterministic Misra-Gries algorithm on long streams. If the white-box adversary is computationally bounded, we use cryptographic techniques to reduce the memory of our L1-heavy hitters algorithm even further and to design a number of additional algorithms for graph, string, and linear algebra problems. The existence of such algorithms is surprising, as the streaming algorithm does not even have a secret key in this model, i.e., its state is entirely known to the adversary. One algorithm we design is for estimating the number of distinct elements in a stream with insertions and deletions achieving a multiplicative approximation and sublinear space; such an algorithm is impossible for deterministic algorithms. We also give a general technique that translates any two-player deterministic communication lower bound to a lower bound for randomized algorithms robust to a white-box adversary. In particular, our results show that for all p=0, there exists a constant Cp>1 such that any Cp-Approximation algorithm for Fp moment estimation in insertion-only streams with a white-box adversary requires ?(n) space for a universe of size n. Similarly, there is a constant C>1 such that any C-Approximation algorithm in an insertion-only stream for matrix rank requires ?(n) space with a white-box adversary. These results do not contradict our upper bounds since they assume the adversary has unbounded computational power. Our algorithmic results based on cryptography thus show a separation between computationally bounded and unbounded adversaries. Finally, we prove a lower bound of ?(log(n)) bits for the fundamental problem of deterministic approximate counting in a stream of 0s and 1s, which holds even if we know how many total stream updates we have seen so far at each point in the stream. Such a lower bound for approximate counting with additional information was previously unknown, and in our context, it shows a separation between multiplayer deterministic maximum communication and the white-box space complexity of a streaming algorithm.
AB - There has been a flurry of recent literature studying streaming algorithms for which the input stream is chosen adaptively by a black-box adversary who observes the output of the streaming algorithm at each time step. However, these algorithms fail when the adversary has access to the internal state of the algorithm, rather than just the output of the algorithm. We study streaming algorithms in the white-box adversarial model, where the stream is chosen adaptively by an adversary who observes the entire internal state of the algorithm at each time step. We show that nontrivial algorithms are still possible. We first give a randomized algorithm for the L1-heavy hitters problem that outperforms the optimal deterministic Misra-Gries algorithm on long streams. If the white-box adversary is computationally bounded, we use cryptographic techniques to reduce the memory of our L1-heavy hitters algorithm even further and to design a number of additional algorithms for graph, string, and linear algebra problems. The existence of such algorithms is surprising, as the streaming algorithm does not even have a secret key in this model, i.e., its state is entirely known to the adversary. One algorithm we design is for estimating the number of distinct elements in a stream with insertions and deletions achieving a multiplicative approximation and sublinear space; such an algorithm is impossible for deterministic algorithms. We also give a general technique that translates any two-player deterministic communication lower bound to a lower bound for randomized algorithms robust to a white-box adversary. In particular, our results show that for all p=0, there exists a constant Cp>1 such that any Cp-Approximation algorithm for Fp moment estimation in insertion-only streams with a white-box adversary requires ?(n) space for a universe of size n. Similarly, there is a constant C>1 such that any C-Approximation algorithm in an insertion-only stream for matrix rank requires ?(n) space with a white-box adversary. These results do not contradict our upper bounds since they assume the adversary has unbounded computational power. Our algorithmic results based on cryptography thus show a separation between computationally bounded and unbounded adversaries. Finally, we prove a lower bound of ?(log(n)) bits for the fundamental problem of deterministic approximate counting in a stream of 0s and 1s, which holds even if we know how many total stream updates we have seen so far at each point in the stream. Such a lower bound for approximate counting with additional information was previously unknown, and in our context, it shows a separation between multiplayer deterministic maximum communication and the white-box space complexity of a streaming algorithm.
KW - adversarial robustness
KW - counting
KW - cryptography
KW - data streams
UR - https://www.scopus.com/pages/publications/85132972357
U2 - 10.1145/3517804.3526228
DO - 10.1145/3517804.3526228
M3 - Conference contribution
AN - SCOPUS:85132972357
T3 - Proceedings of the ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems
SP - 15
EP - 27
BT - PODS 2022 - Proceedings of the 41st ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems
PB - Association for Computing Machinery
T2 - 41st ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems, PODS 2022
Y2 - 12 June 2022 through 17 June 2022
ER -