TY - GEN
T1 - A continuous Markov-Chain model of data quality transition
T2 - 10th International Conference on Design Science Research in Information Systems and Technology, DESRIST 2015
AU - Zak, Yuval
AU - Even, Adir
N1 - Publisher Copyright:
© Springer International Publishing Switzerland 2015
PY - 2015/1/1
Y1 - 2015/1/1
N2 - Data quality (DQ) might degrade over time, due to changes in realworld entities or behaviors that are not reflected correctly in datasets that describe them. This study presents a continuous-time Markov-Chain model that reflects DQ as a dynamic process. The model may help assessing and predicting accuracy degradation over time. Taking into account cost-benefit tradeoffs, it can also be used to recommend an economically-optimal point in time at which data values should be evaluated and possibly reacquired. The model addresses data-acquisition scenarios that reflect real-world processes with a finite number of states, each described by certain data-attribute values. It takes into account state-transition probabilities, the distribution of time spent in each state, the damage associated with incorrect data that fails to reflect the real-world state, and the cost of data reacquisition. Given current state and the time passed since the last transition, the model estimates the expected damage of a data record and recommends whether or not to correct it, by comparing the potential benefits of correction (elimination of potential damage), versus reacquisition cost. Following common design science research guidelines, the applicability and the potential contribution of the model is demonstrated with a real-world dataset that reflects a process of handling insurance claims. Insurants' status must be kept up-to-date, to avoid potential monetary damages; however, contacting an insurant for status update is costly and time consuming. Currently the contact decision is guided by some heuristics that are based on employees' experience. The evaluation shows that applying the model has major cost-saving potential, compared to the current state.
AB - Data quality (DQ) might degrade over time, due to changes in realworld entities or behaviors that are not reflected correctly in datasets that describe them. This study presents a continuous-time Markov-Chain model that reflects DQ as a dynamic process. The model may help assessing and predicting accuracy degradation over time. Taking into account cost-benefit tradeoffs, it can also be used to recommend an economically-optimal point in time at which data values should be evaluated and possibly reacquired. The model addresses data-acquisition scenarios that reflect real-world processes with a finite number of states, each described by certain data-attribute values. It takes into account state-transition probabilities, the distribution of time spent in each state, the damage associated with incorrect data that fails to reflect the real-world state, and the cost of data reacquisition. Given current state and the time passed since the last transition, the model estimates the expected damage of a data record and recommends whether or not to correct it, by comparing the potential benefits of correction (elimination of potential damage), versus reacquisition cost. Following common design science research guidelines, the applicability and the potential contribution of the model is demonstrated with a real-world dataset that reflects a process of handling insurance claims. Insurants' status must be kept up-to-date, to avoid potential monetary damages; however, contacting an insurant for status update is costly and time consuming. Currently the contact decision is guided by some heuristics that are based on employees' experience. The evaluation shows that applying the model has major cost-saving potential, compared to the current state.
KW - Accuracy
KW - Continuous-Time Markov Chain
KW - Data Quality
KW - Design Science Research
UR - http://www.scopus.com/inward/record.url?scp=84937501504&partnerID=8YFLogxK
U2 - 10.1007/978-3-319-18714-3_13
DO - 10.1007/978-3-319-18714-3_13
M3 - Conference contribution
AN - SCOPUS:84937501504
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 199
EP - 214
BT - New Horizons in Design Science
A2 - Donnellan, Brian
A2 - VanderMeer, Debra
A2 - Kenneally, Jim
A2 - Winter, Robert
A2 - Rothenberger, Marcus
A2 - Helfert, Markus
PB - Springer Verlag
Y2 - 20 May 2015 through 22 May 2015
ER -