TY - GEN
T1 - On provenance minimization
AU - Amsterdamer, Yael
AU - Deutch, Daniel
AU - Milo, Tova
AU - Tannen, Val
PY - 2011/7/15
Y1 - 2011/7/15
N2 - Provenance information has been proved to be very effective in capturing the computational process performed by queries, and has been used extensively as the input to many advanced data management tools (e.g. view maintenance, trust assessment, or query answering in probabilistic databases). We study here the core of provenance information, namely the part of provenance that appears in the computation of every query equivalent to the given one. This provenance core is informative as it describes the part of the computational process that is inherent to the query. It is also useful as a compact input to the above mentioned data management tools. We study algorithms that, given a query, compute an equivalent query that realizes the core provenance for all tuples in its result. We study these algorithms for queries of varying expressive power. Finally, we observe that, in general, one would not want to require database systems to evaluate a specific query that realizes the core provenance, but instead to be able to find, possibly off-line, the core provenance of a given tuple in the output (computed by an arbitrary equivalent query), without rewriting the query. We provide algorithms for such direct computation of the core provenance.
AB - Provenance information has been proved to be very effective in capturing the computational process performed by queries, and has been used extensively as the input to many advanced data management tools (e.g. view maintenance, trust assessment, or query answering in probabilistic databases). We study here the core of provenance information, namely the part of provenance that appears in the computation of every query equivalent to the given one. This provenance core is informative as it describes the part of the computational process that is inherent to the query. It is also useful as a compact input to the above mentioned data management tools. We study algorithms that, given a query, compute an equivalent query that realizes the core provenance for all tuples in its result. We study these algorithms for queries of varying expressive power. Finally, we observe that, in general, one would not want to require database systems to evaluate a specific query that realizes the core provenance, but instead to be able to find, possibly off-line, the core provenance of a given tuple in the output (computed by an arbitrary equivalent query), without rewriting the query. We provide algorithms for such direct computation of the core provenance.
UR - http://www.scopus.com/inward/record.url?scp=79960159844&partnerID=8YFLogxK
U2 - 10.1145/1989284.1989303
DO - 10.1145/1989284.1989303
M3 - Conference contribution
AN - SCOPUS:79960159844
SN - 9781450306607
T3 - Proceedings of the ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems
SP - 141
EP - 152
BT - PODS'11 - Proceedings of the 30th Symposium on Principles of Database Systems
T2 - 30th Symposium on Principles of Database Systems, PODS'11
Y2 - 13 May 2011 through 15 May 2011
ER -