SALSA: Self-adjusting lean streaming analytics

Ran Ben Basat, Gil Einziger, Michael Mitzenmacher, Shay Vargaftik

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

21 Scopus citations

Abstract

Counters are the fundamental building block of many data sketching schemes, which hash items to a small number of counters and account for collisions to provide good approximations for frequencies and other measures. Most existing methods rely on fixed-size counters, which may be wasteful in terms of space, as counters must be large enough to eliminate any risk of overflow. Instead, some solutions use small, fixed-size counters that may overflow into secondary structures.This paper takes a different approach. We propose a simple and general method called SALSA for dynamic re-sizing of counters, and show its effectiveness. SALSA starts with small counters, and overflowing counters simply merge with their neighbors. SALSA can thereby allow more counters for a given space, expanding them as necessary to represent large numbers. Our evaluation demonstrates that, at the cost of a small overhead for its merging logic, SALSA significantly improves the accuracy of popular schemes (such as Count-Min Sketch and Count Sketch) over a variety of tasks. Our code is released as open source [1].

Original languageEnglish
Title of host publicationProceedings - 2021 IEEE 37th International Conference on Data Engineering, ICDE 2021
PublisherInstitute of Electrical and Electronics Engineers
Pages864-875
Number of pages12
ISBN (Electronic)9781728191843
DOIs
StatePublished - 1 Apr 2021
Event37th IEEE International Conference on Data Engineering, ICDE 2021 - Virtual, Chania, Greece
Duration: 19 Apr 202122 Apr 2021

Publication series

NameProceedings - International Conference on Data Engineering
Volume2021-April
ISSN (Print)1084-4627

Conference

Conference37th IEEE International Conference on Data Engineering, ICDE 2021
Country/TerritoryGreece
CityVirtual, Chania
Period19/04/2122/04/21

ASJC Scopus subject areas

  • Software
  • Signal Processing
  • Information Systems

Fingerprint

Dive into the research topics of 'SALSA: Self-adjusting lean streaming analytics'. Together they form a unique fingerprint.

Cite this