TY - GEN
T1 - SALSA
T2 - 37th IEEE International Conference on Data Engineering, ICDE 2021
AU - Basat, Ran Ben
AU - Einziger, Gil
AU - Mitzenmacher, Michael
AU - Vargaftik, Shay
N1 - Funding Information:
Some of our results are deferred to the full version [29], where we explain and evaluate fine-grained merges, where counters can grow slower than doubling in size. As shown, the gain is marginal. We also discuss the count distinct functionality of the CM sketch, discuss how SALSA empirically improves it, and evaluate the accuracy gain. The full version also includes an improved (albeit slower) encoding for SALSA that requires under 0.6 bits per counter, with a lower bound that shows that this is near-optimal. Finally, we provide the missing proofs and additional explanations. All of our code is released as open source [1]. Acknowledgements: This project was supported in part by NSF grants CCF-1563710 and CCF-1535795, by a gift to the Center for Research on Computation and Society at Harvard University, and by the Cyber Security Research Center, the Data Science Research Center, and the Lynne and William Frankel Center for Computing Science at Ben-Gurion University. REFERENCES
Publisher Copyright:
© 2021 IEEE.
PY - 2021/4/1
Y1 - 2021/4/1
N2 - Counters are the fundamental building block of many data sketching schemes, which hash items to a small number of counters and account for collisions to provide good approximations for frequencies and other measures. Most existing methods rely on fixed-size counters, which may be wasteful in terms of space, as counters must be large enough to eliminate any risk of overflow. Instead, some solutions use small, fixed-size counters that may overflow into secondary structures.This paper takes a different approach. We propose a simple and general method called SALSA for dynamic re-sizing of counters, and show its effectiveness. SALSA starts with small counters, and overflowing counters simply merge with their neighbors. SALSA can thereby allow more counters for a given space, expanding them as necessary to represent large numbers. Our evaluation demonstrates that, at the cost of a small overhead for its merging logic, SALSA significantly improves the accuracy of popular schemes (such as Count-Min Sketch and Count Sketch) over a variety of tasks. Our code is released as open source [1].
AB - Counters are the fundamental building block of many data sketching schemes, which hash items to a small number of counters and account for collisions to provide good approximations for frequencies and other measures. Most existing methods rely on fixed-size counters, which may be wasteful in terms of space, as counters must be large enough to eliminate any risk of overflow. Instead, some solutions use small, fixed-size counters that may overflow into secondary structures.This paper takes a different approach. We propose a simple and general method called SALSA for dynamic re-sizing of counters, and show its effectiveness. SALSA starts with small counters, and overflowing counters simply merge with their neighbors. SALSA can thereby allow more counters for a given space, expanding them as necessary to represent large numbers. Our evaluation demonstrates that, at the cost of a small overhead for its merging logic, SALSA significantly improves the accuracy of popular schemes (such as Count-Min Sketch and Count Sketch) over a variety of tasks. Our code is released as open source [1].
UR - http://www.scopus.com/inward/record.url?scp=85112866379&partnerID=8YFLogxK
U2 - 10.1109/ICDE51399.2021.00080
DO - 10.1109/ICDE51399.2021.00080
M3 - Conference contribution
AN - SCOPUS:85112866379
T3 - Proceedings - International Conference on Data Engineering
SP - 864
EP - 875
BT - Proceedings - 2021 IEEE 37th International Conference on Data Engineering, ICDE 2021
PB - Institute of Electrical and Electronics Engineers
Y2 - 19 April 2021 through 22 April 2021
ER -