TY - CHAP
T1 - Selective provenance for datalog programs using top-k queries
AU - Deutch, Daniel
AU - Gilad, Amir
AU - Moskovitch, Yuval
PY - 2015/1/1
Y1 - 2015/1/1
N2 - Highly expressive declarative languages, such as datalog, are now commonly used to model the operational logic of dataintensive applications. The typical complexity of such datalog programs, and the large volume of data that they process, call for result explanation. Results may be explained through the tracking and presentation of data provenance, and here we focus on a detailed form of provenance (howprovenance), defining it as the set of derivation trees of a given fact. While informative, the size of such full provenance information is typically too large and complex (even when compactly represented) to allow displaying it to the user. To this end, we propose a novel top-k query language for querying datalog provenance, supporting selection criteria based on tree patterns and ranking based on the rules and database facts used in derivation. We propose an effcient novel algorithm based on (1) instrumenting the datalog program so that, upon evaluation, it generates only relevant provenance, and (2) effcient top-k (relevant) provenance generation, combined with bottom-up datalog evaluation. The algorithm computes in polynomial data complexity a compact representation of the top-k trees which may be explicitly constructed in linear time with respect to their size. We further experimentally study the algorithm performance, showing its scalability even for complex datalog programs where full provenance tracking is infeasible.
AB - Highly expressive declarative languages, such as datalog, are now commonly used to model the operational logic of dataintensive applications. The typical complexity of such datalog programs, and the large volume of data that they process, call for result explanation. Results may be explained through the tracking and presentation of data provenance, and here we focus on a detailed form of provenance (howprovenance), defining it as the set of derivation trees of a given fact. While informative, the size of such full provenance information is typically too large and complex (even when compactly represented) to allow displaying it to the user. To this end, we propose a novel top-k query language for querying datalog provenance, supporting selection criteria based on tree patterns and ranking based on the rules and database facts used in derivation. We propose an effcient novel algorithm based on (1) instrumenting the datalog program so that, upon evaluation, it generates only relevant provenance, and (2) effcient top-k (relevant) provenance generation, combined with bottom-up datalog evaluation. The algorithm computes in polynomial data complexity a compact representation of the top-k trees which may be explicitly constructed in linear time with respect to their size. We further experimentally study the algorithm performance, showing its scalability even for complex datalog programs where full provenance tracking is infeasible.
UR - http://www.scopus.com/inward/record.url?scp=84953855982&partnerID=8YFLogxK
U2 - 10.14778/2824032.2824039
DO - 10.14778/2824032.2824039
M3 - Chapter
AN - SCOPUS:84953855982
T3 - Proceedings of the VLDB Endowment
SP - 1394
EP - 1405
BT - Proceedings of the VLDB Endowment
PB - Association for Computing Machinery
T2 - 3rd Workshop on Spatio-Temporal Database Management, STDBM 2006, Co-located with the 32nd International Conference on Very Large Data Bases, VLDB 2006
Y2 - 11 September 2006 through 11 September 2006
ER -