Sampling from dense streams without penalty improved bounds for frequency moments and heavy hitters

Vladimir Braverman, Gregory Vorsanger

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

We investigate the ability to sample relatively small amounts of data from a stream and approximately calculate statistics on the original stream. McGregor et al. [29] provide worst case theoretical bounds that show space costs for sampling that are inversely correlated with the sampling rate. Indeed, while the lower bound of McGregor et al. cannot be improved in the general case, we show it is possible to improve the space bound for stream D of domain n, when the average positive frequency μ=F 1/F0 is sufficiently large. We consider the following range of parameters: μ≥log(n) and sample rate p≥Ckμ-1log(n), where Ck is a constant. On these streams we improve the bound from Õ(1/pn 1-2/k) to thus giving polynomial improvement in space for sufficiently large μ and p -1.

Original languageEnglish
Title of host publicationComputing and Combinatorics - 20th International Conference, COCOON 2014, Proceedings
PublisherSpringer Verlag
Pages13-24
Number of pages12
ISBN (Print)9783319087825
DOIs
StatePublished - 1 Jan 2014
Externally publishedYes
Event20th International Computing and Combinatorics Conference, COCOON 2014 - Atlanta, GA, United States
Duration: 4 Aug 20146 Aug 2014

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume8591 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference20th International Computing and Combinatorics Conference, COCOON 2014
Country/TerritoryUnited States
CityAtlanta, GA
Period4/08/146/08/14

Keywords

  • Frequency Moments
  • Heavy Hitters
  • Sampling
  • Streaming Algorithms

ASJC Scopus subject areas

  • Theoretical Computer Science
  • General Computer Science

Fingerprint

Dive into the research topics of 'Sampling from dense streams without penalty improved bounds for frequency moments and heavy hitters'. Together they form a unique fingerprint.

Cite this