Beating CountSketch for heavy hitters in insertion streams

Vladimir Braverman, Nikita Ivkin, Stephen R. Chestnut, David P. Woodruff

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

24 Scopus citations

Abstract

Given a stream p1,..., pm of items from a universe U, which, without loss of generality we identify with the set of integers {1,2,...,n}, we consider the problem of returning all l2-heavy hitters, i.e., those items j for which fj ≥ ϵ√F2, where fj is the number of occurrences of item j in the stream, and F2 = Σi∈[n] f2i. Such a guarantee is considerably stronger than the l1-guarantee, which finds those j for which fj ≥ ϵm. In 2002, Charikar, Chen, and Farach-Colton suggested the CountSketch data structure, which finds all such j using Θ(log2 n) bits of space (for constant ϵ > 0). The only known lower bound is Ω(log n) bits of space, which comes from the need to specify the identities of the items found. In this paper we show one can achieve O(log n log log n) bits of space for this problem. Our techniques, based on Gaussian processes, lead to a number of other new results for data streams, including: (1) The first algorithm for estimating F2 simultaneously at all points in a stream using only O(log n log log n) bits of space, improving a natural union bound. (2) A way to estimate the l norm of a stream up to additive error ϵ√F2 with O (log n log log n) bits of space, resolving Open Question 3 from the IITK 2006 list for insertion only streams.

Original languageEnglish
Title of host publicationSTOC 2016 - Proceedings of the 48th Annual ACM SIGACT Symposium on Theory of Computing
EditorsYishay Mansour, Daniel Wichs
PublisherAssociation for Computing Machinery
Pages740-753
Number of pages14
ISBN (Electronic)9781450341325
DOIs
StatePublished - 19 Jun 2016
Externally publishedYes
Event48th Annual ACM SIGACT Symposium on Theory of Computing, STOC 2016 - Cambridge, United States
Duration: 19 Jun 201621 Jun 2016

Publication series

NameProceedings of the Annual ACM Symposium on Theory of Computing
Volume19-21-June-2016
ISSN (Print)0737-8017

Conference

Conference48th Annual ACM SIGACT Symposium on Theory of Computing, STOC 2016
Country/TerritoryUnited States
CityCambridge
Period19/06/1621/06/16

Keywords

  • Chaining
  • Data streams
  • Heavy hitters

ASJC Scopus subject areas

  • Software

Fingerprint

Dive into the research topics of 'Beating CountSketch for heavy hitters in insertion streams'. Together they form a unique fingerprint.

Cite this