Regular programming for quantitative properties of data streams

Rajeev Alur, Dana Fisman, Mukund Raghothaman

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

39 Scopus citations

Abstract

We propose quantitative regular expressions (QREs) as a high-level programming abstraction for specifying complex numerical queries over data streams in a modular way. Our language allows the arbitrary nesting of orthogonal sets of combinators: (a) generalized versions of choice, concatenation, and Kleene-iteration from regular expressions, (b) streaming (serial) composition, and (c) numerical operators such as min, max, sum, difference, and averaging. Instead of requiring the programmer to figure out the low-level details of what state needs to be maintained and how to update it while processing each data item, the regular constructs facilitate a global view of the entire data stream splitting it into different cases and multiple chunks. The key technical challenge in defining our language is the design of typing rules that can be enforced efficiently and which strike a balance between expressiveness and theoretical guarantees for well-typed programs. We describe how to compile each QRE into an efficient streaming algorithm. The time and space complexity is dependent on the complexity of the data structure for representing terms over the basic numerical operators. In particular, we show that when the set of numerical operations is sum, difference, minimum, maximum, and average, the compiled algorithm uses constant space and processes each symbol in the data stream in constant time outputting the cost of the stream processed so far. Finally, we prove that the expressiveness of QREs coincides with the streaming composition of regular functions, that is, MSO-definable string-to-term transformations, leading to a potentially robust foundation for understanding their expressiveness and the complexity of analysis problems.

Original languageEnglish
Title of host publicationProgramming Languages and Systems - 25th European Symposium on Programming, ESOP 2016, Held as Part of the European Joint Conferences on Theory and Practice of Software, ETAPS 2016, Proceedings
EditorsPeter Thiemann
PublisherSpringer Verlag
Pages15-40
Number of pages26
ISBN (Print)9783662494974
DOIs
StatePublished - 1 Jan 2016
Externally publishedYes
Event25th European Symposium on Programming, ESOP 2016 and Held as Part of the European Joint Conferences on Theory and Practice of Software, ETAPS 2016 - Eindhoven, Netherlands
Duration: 2 Apr 20168 Apr 2016

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume9632
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference25th European Symposium on Programming, ESOP 2016 and Held as Part of the European Joint Conferences on Theory and Practice of Software, ETAPS 2016
Country/TerritoryNetherlands
CityEindhoven
Period2/04/168/04/16

ASJC Scopus subject areas

  • Theoretical Computer Science
  • Computer Science (all)

Fingerprint

Dive into the research topics of 'Regular programming for quantitative properties of data streams'. Together they form a unique fingerprint.

Cite this