TY - GEN
T1 - Provenance for aggregate queries
AU - Amsterdamer, Yael
AU - Deutch, Daniel
AU - Tannen, Val
PY - 2011/7/15
Y1 - 2011/7/15
N2 - We study in this paper provenance information for queries with aggregation. Provenance information was studied in the context of various query languages that do not allow for aggregation, and recent work has suggested to capture provenance by annotating the different database tuples with elements of a commutative semiring and propagating the annotations through query evaluation. We show that aggregate queries pose novel challenges rendering this approach inapplicable. Consequently, we propose a new approach, where we annotate with provenance information not just tuples but also the individual values within tuples, using provenance to describe the values computation. We realize this approach in a concrete construction, first for "simple" queries where the aggregation operator is the last one applied, and then for arbitrary (positive) relational algebra queries with aggregation; the latter queries are shown to be more challenging in this context. Finally, we use aggregation to encode queries with difference, and study the semantics obtained for such queries on provenance annotated databases.
AB - We study in this paper provenance information for queries with aggregation. Provenance information was studied in the context of various query languages that do not allow for aggregation, and recent work has suggested to capture provenance by annotating the different database tuples with elements of a commutative semiring and propagating the annotations through query evaluation. We show that aggregate queries pose novel challenges rendering this approach inapplicable. Consequently, we propose a new approach, where we annotate with provenance information not just tuples but also the individual values within tuples, using provenance to describe the values computation. We realize this approach in a concrete construction, first for "simple" queries where the aggregation operator is the last one applied, and then for arbitrary (positive) relational algebra queries with aggregation; the latter queries are shown to be more challenging in this context. Finally, we use aggregation to encode queries with difference, and study the semantics obtained for such queries on provenance annotated databases.
UR - http://www.scopus.com/inward/record.url?scp=79960168520&partnerID=8YFLogxK
U2 - 10.1145/1989284.1989302
DO - 10.1145/1989284.1989302
M3 - Conference contribution
AN - SCOPUS:79960168520
SN - 9781450306607
T3 - Proceedings of the ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems
SP - 153
EP - 164
BT - PODS'11 - Proceedings of the 30th Symposium on Principles of Database Systems
T2 - 30th Symposium on Principles of Database Systems, PODS'11
Y2 - 13 May 2011 through 15 May 2011
ER -