TY - GEN
T1 - On Multiple Semantics for Declarative Database Repairs
AU - Gilad, Amir
AU - Deutch, Daniel
AU - Roy, Sudeepa
N1 - Funding Information:
Example 1.1. Consider the database in Figure 1 based on an academic database [35]. It contains the tables Grant (grant foundations), Author (paper authors), AuthGrant (a relationship of authors and grants given by a foundation), Pub (a publication table), Writes (a relationship table between Author and Pub), and Cite (a citation table of citing and cited relationships). For each tuple, we also have an identifier on the leftmost column of each table (e.g., ag1 is the identifier of AuthGrant(2, 1)). Consider the following four constraints specifying how to repair the tables (there could be other rules capturing different repair scenarios): (1) If a Grant tuple is deleted and there is an author who won a grant by this foundation, denoted as an AuthGrant tuple, then delete the winning author. (2) If an Author tuple is deleted and the corresponding Writes and Pub tuples exist in the database, delete the corresponding Writes tuple (as in cascade delete seman-tics for foreign keys). (3) Under the same condition as above, delete the correspond-ing Pub tuple (not standard foreign keys, but suggesting that every author is important for a publication to exist). (4) If a publication p from the Pub table is deleted, and is cited by another publication c, while some authors of these papers still exist in the database, then delete the Cite tuple1. Suppose we are analyzing a subset of this database containing only authors affiliated with U.S. schools and only papers written solely by U.S. authors. ERC grants are given only to European institutions and its Grant tuple was incorrectly added to the U.S. database, so this tuple g2 needs to be deleted. However, this deletion causes violations in the above constraints. To repair the database based on these constraints, we could proceed in various ways: considering the semantics of triggers and causal rules, we can delete tuples a2, w1, p1, a3, w2, p2 and c, and regain the integrity of the database but at the cost of deleting seven tuples. A different approach is to delete a2 and either w1 or p1, and delete a3 and either w2 or p2, which would only delete four tuples. However, if we consider the semantics of DCs, we could delete any tuple out of the set of tuples that violates the constraint. So, we can just delete the tuples ag2, ag3. This would satisfy the first constraint and thus the second, third and fourth constraints will also be satisfied.
Publisher Copyright:
© 2020 Association for Computing Machinery.
PY - 2020/6/14
Y1 - 2020/6/14
N2 - We study the problem of database repairs through a rule-based framework that we refer to as Delta Rules. Delta rules are highly expressive and allow specifying complex, cross-relations repair logic associated with Denial Constraints, Causal Rules, and allowing to capture Database Triggers of interest. We show that there are no one-size-fits-all semantics for repairs in this inclusive setting, and we consequently introduce multiple alternative semantics, presenting the case for using each of them. We then study the relationships between the semantics in terms of their output and the complexity of computation. Our results formally establish the tradeoff between the permissiveness of the semantics and its computational complexity. We demonstrate the usefulness of the framework in capturing multiple data repair scenarios for an academic search database and the TPC-H databases, showing how using different semantics affects the repair in terms of size and runtime, and examining the relationships between the repairs. We also compare our approach with SQL triggers and a state-of-the-art data repair system.
AB - We study the problem of database repairs through a rule-based framework that we refer to as Delta Rules. Delta rules are highly expressive and allow specifying complex, cross-relations repair logic associated with Denial Constraints, Causal Rules, and allowing to capture Database Triggers of interest. We show that there are no one-size-fits-all semantics for repairs in this inclusive setting, and we consequently introduce multiple alternative semantics, presenting the case for using each of them. We then study the relationships between the semantics in terms of their output and the complexity of computation. Our results formally establish the tradeoff between the permissiveness of the semantics and its computational complexity. We demonstrate the usefulness of the framework in capturing multiple data repair scenarios for an academic search database and the TPC-H databases, showing how using different semantics affects the repair in terms of size and runtime, and examining the relationships between the repairs. We also compare our approach with SQL triggers and a state-of-the-art data repair system.
KW - database constraints
KW - provenance
KW - repairs
KW - triggers
UR - http://www.scopus.com/inward/record.url?scp=85086280002&partnerID=8YFLogxK
U2 - 10.1145/3318464.3389721
DO - 10.1145/3318464.3389721
M3 - Conference contribution
AN - SCOPUS:85086280002
T3 - Proceedings of the ACM SIGMOD International Conference on Management of Data
SP - 817
EP - 831
BT - SIGMOD 2020 - Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data
PB - Association for Computing Machinery
T2 - 2020 ACM SIGMOD International Conference on Management of Data, SIGMOD 2020
Y2 - 14 June 2020 through 19 June 2020
ER -