TY - GEN
T1 - Hypothetical reasoning via provenance abstraction
AU - Deutch, Daniel
AU - Moskovitch, Yuval
AU - Rinetzky, Noam
N1 - Publisher Copyright:
Copyright © 2019 held by the owner/author(s). Publication rights licensed to ACM.
PY - 2019/6/25
Y1 - 2019/6/25
N2 - Data analytics often involves hypothetical reasoning: repeatedly modifying the data and observing the induced effect on the computation result of a data-centric application. Previous work has shown that fine-grained data provenance can help make such an analysis more efficient: instead of a costly re-execution of the underlying application, hypothetical scenarios are applied to a pre-computed provenance expression. However, storing provenance for complex queries and large-scale data leads to a significant overhead, which is often a barrier to the incorporation of provenance-based solutions. To this end, we present a framework that allows to reduce provenance size. Our approach is based on reducing the provenance granularity using user defined abstraction trees over the provenance variables; the granularity is based on the anticipated hypothetical scenarios. We formalize the tradeoff between provenance size and supported granularity of the hypothetical reasoning, and study the complexity of the resulting optimization problem, provide efficient algorithms for tractable cases and heuristics for others. We experimentally study the performance of our solution for various queries and abstraction trees. Our study shows that the algorithms generally lead to substantial speedup of hypothetical reasoning, with a reasonable loss of accuracy.
AB - Data analytics often involves hypothetical reasoning: repeatedly modifying the data and observing the induced effect on the computation result of a data-centric application. Previous work has shown that fine-grained data provenance can help make such an analysis more efficient: instead of a costly re-execution of the underlying application, hypothetical scenarios are applied to a pre-computed provenance expression. However, storing provenance for complex queries and large-scale data leads to a significant overhead, which is often a barrier to the incorporation of provenance-based solutions. To this end, we present a framework that allows to reduce provenance size. Our approach is based on reducing the provenance granularity using user defined abstraction trees over the provenance variables; the granularity is based on the anticipated hypothetical scenarios. We formalize the tradeoff between provenance size and supported granularity of the hypothetical reasoning, and study the complexity of the resulting optimization problem, provide efficient algorithms for tractable cases and heuristics for others. We experimentally study the performance of our solution for various queries and abstraction trees. Our study shows that the algorithms generally lead to substantial speedup of hypothetical reasoning, with a reasonable loss of accuracy.
KW - Hypothetical reasoning
KW - Provenance compression
UR - http://www.scopus.com/inward/record.url?scp=85067934092&partnerID=8YFLogxK
U2 - 10.1145/3299869.3300084
DO - 10.1145/3299869.3300084
M3 - Conference contribution
AN - SCOPUS:85067934092
T3 - Proceedings of the ACM SIGMOD International Conference on Management of Data
SP - 537
EP - 554
BT - SIGMOD 2019 - Proceedings of the 2019 International Conference on Management of Data
PB - Association for Computing Machinery
T2 - 2019 International Conference on Management of Data, SIGMOD 2019
Y2 - 30 June 2019 through 5 July 2019
ER -