ProvCite: Provenancebased data citation

Yinjun Wu, Abdussalam Alawini, Daniel Deutch, Tova Milo, Susan Davidson

Research output: Contribution to journalConference articlepeer-review

6 Scopus citations

Abstract

As research products expand to include structured datasets, the challenge arises of how to automatically generate cita-tions to the results of arbitrary queries against such datasets. Previous work explored this problem in the context of con-junctive queries and views using a Rewriting-Based Model (RBM). However, an increasing number of scientific queries are aggregate, e.g. statistical summaries of the underlying data, for which the RBM cannot be easily extended. In this paper, we show how a Provenance-Based Model (PBM) can be leveraged to 1) generate citations to conjunctive as well as aggregate queries and views; 2) associate citations with indi-vidual result tuples to enable arbitrary subsets of the result set to be cited (fine-grained citations); and 3) be optimized to return citations in acceptable time. Our implementation of PBM in ProvCite shows that it not only handles a larger class of queries and views than RBM, but can outperform it when restricted to conjunctive views in some cases.

Original languageEnglish
Pages (from-to)738-751
Number of pages14
JournalProceedings of the VLDB Endowment
Volume12
Issue number7
DOIs
StatePublished - 1 Jan 2018
Externally publishedYes
Event45th International Conference on Very Large Data Bases, VLDB 2019 - Los Angeles, United States
Duration: 26 Aug 201730 Aug 2017

ASJC Scopus subject areas

  • Computer Science (miscellaneous)
  • Computer Science (all)

Cite this