TY - GEN
T1 - NEXUS
T2 - 2023 ACM/SIGMOD International Conference on Management of Data, SIGMOD 2023
AU - Youngmann, Brit
AU - Cafarella, Michael
AU - Moskovitch, Yuval
AU - Salimi, Babak
N1 - Publisher Copyright:
© 2023 ACM.
PY - 2023/6/4
Y1 - 2023/6/4
N2 - When analyzing large datasets, analysts are often interested in the explanations for unexpected results produced by their queries. In this work, we focus on aggregate SQL queries that expose correlations in the data. A major challenge that hinders the interpretation of such queries is confounding bias, which can lead to an unexpected association between variables. For example, a SQL query computes the average Covid-19 death rate in each country, may expose a puzzling correlation between the country and the death rate. In this work, we demonstrate NEXUS, a system that generates explanations in terms of a set of potential confounding variables that explain the unexpected correlation observed in a query. NEXUS mines candidate confounding variables from external sources since, in many real-life scenarios, the explanations are not solely contained in the input data. For instance, NEXUS might extract data about factors explaining the association between countries and the Covid-19 death rate, such as information about countries' economies and health outcomes. We will demonstrate the utility of NEXUS for investigating unexpected query results by interacting with the SIGMOD'23 participants, who will act as data analysts.
AB - When analyzing large datasets, analysts are often interested in the explanations for unexpected results produced by their queries. In this work, we focus on aggregate SQL queries that expose correlations in the data. A major challenge that hinders the interpretation of such queries is confounding bias, which can lead to an unexpected association between variables. For example, a SQL query computes the average Covid-19 death rate in each country, may expose a puzzling correlation between the country and the death rate. In this work, we demonstrate NEXUS, a system that generates explanations in terms of a set of potential confounding variables that explain the unexpected correlation observed in a query. NEXUS mines candidate confounding variables from external sources since, in many real-life scenarios, the explanations are not solely contained in the input data. For instance, NEXUS might extract data about factors explaining the association between countries and the Covid-19 death rate, such as information about countries' economies and health outcomes. We will demonstrate the utility of NEXUS for investigating unexpected query results by interacting with the SIGMOD'23 participants, who will act as data analysts.
KW - aggregated SQL queries
KW - confounding bias
KW - knowledge graphs
UR - http://www.scopus.com/inward/record.url?scp=85162932245&partnerID=8YFLogxK
U2 - 10.1145/3555041.3589728
DO - 10.1145/3555041.3589728
M3 - Conference contribution
AN - SCOPUS:85162932245
T3 - Proceedings of the ACM SIGMOD International Conference on Management of Data
SP - 171
EP - 174
BT - SIGMOD 2023 - Companion of the 2023 ACM/SIGMOD International Conference on Management of Data
PB - Association for Computing Machinery
Y2 - 18 June 2023 through 23 June 2023
ER -