TY - GEN
T1 - SALSA
T2 - 37th IEEE International Conference on Data Engineering, ICDE 2021
AU - Basat, Ran Ben
AU - Einziger, Gil
AU - Mitzenmacher, Michael
AU - Vargaftik, Shay
N1 - Publisher Copyright:
© 2021 IEEE.
PY - 2021/4/1
Y1 - 2021/4/1
N2 - Counters are the fundamental building block of many data sketching schemes, which hash items to a small number of counters and account for collisions to provide good approximations for frequencies and other measures. Most existing methods rely on fixed-size counters, which may be wasteful in terms of space, as counters must be large enough to eliminate any risk of overflow. Instead, some solutions use small, fixed-size counters that may overflow into secondary structures.This paper takes a different approach. We propose a simple and general method called SALSA for dynamic re-sizing of counters, and show its effectiveness. SALSA starts with small counters, and overflowing counters simply merge with their neighbors. SALSA can thereby allow more counters for a given space, expanding them as necessary to represent large numbers. Our evaluation demonstrates that, at the cost of a small overhead for its merging logic, SALSA significantly improves the accuracy of popular schemes (such as Count-Min Sketch and Count Sketch) over a variety of tasks. Our code is released as open source [1].
AB - Counters are the fundamental building block of many data sketching schemes, which hash items to a small number of counters and account for collisions to provide good approximations for frequencies and other measures. Most existing methods rely on fixed-size counters, which may be wasteful in terms of space, as counters must be large enough to eliminate any risk of overflow. Instead, some solutions use small, fixed-size counters that may overflow into secondary structures.This paper takes a different approach. We propose a simple and general method called SALSA for dynamic re-sizing of counters, and show its effectiveness. SALSA starts with small counters, and overflowing counters simply merge with their neighbors. SALSA can thereby allow more counters for a given space, expanding them as necessary to represent large numbers. Our evaluation demonstrates that, at the cost of a small overhead for its merging logic, SALSA significantly improves the accuracy of popular schemes (such as Count-Min Sketch and Count Sketch) over a variety of tasks. Our code is released as open source [1].
UR - http://www.scopus.com/inward/record.url?scp=85112866379&partnerID=8YFLogxK
U2 - 10.1109/ICDE51399.2021.00080
DO - 10.1109/ICDE51399.2021.00080
M3 - Conference contribution
AN - SCOPUS:85112866379
T3 - Proceedings - International Conference on Data Engineering
SP - 864
EP - 875
BT - Proceedings - 2021 IEEE 37th International Conference on Data Engineering, ICDE 2021
PB - Institute of Electrical and Electronics Engineers
Y2 - 19 April 2021 through 22 April 2021
ER -