TY - GEN
T1 - Probabilistic approaches to overcome content heterogeneity in data integration
T2 - 30th Medical Informatics Europe Conference, MIE 2020
AU - on behalf of the MASTER plans Consortium
AU - Sampri, Alexia
AU - Geifman, Nophar
AU - Le Sueur, Helen
AU - Doherty, Patrick
AU - Couch, Philip
AU - Bruce, Ian
AU - Peek, Niels
N1 - Publisher Copyright:
© 2020 European Federation for Medical Informatics (EFMI) and IOS Press.
PY - 2020/6/16
Y1 - 2020/6/16
N2 - Integrating data from different sources into homogeneous dataset increases the opportunities to study human health. However, disparate data collections are often heterogeneous, which complicates their integration. In this paper, we focus on the issue of content heterogeneity in data integration. Traditional approaches for resolving content heterogeneity map all source datasets to a common data model that includes only shared data items, and thus omit all items that vary between datasets. Based on an example of three datasets in Systemic Lupus Erythematosus, we describe and experimentally evaluate a probabilistic data integration approach which propagates the uncertainty resulting from content heterogeneity into statistical inference, avoiding the need to map to a common data model.
AB - Integrating data from different sources into homogeneous dataset increases the opportunities to study human health. However, disparate data collections are often heterogeneous, which complicates their integration. In this paper, we focus on the issue of content heterogeneity in data integration. Traditional approaches for resolving content heterogeneity map all source datasets to a common data model that includes only shared data items, and thus omit all items that vary between datasets. Based on an example of three datasets in Systemic Lupus Erythematosus, we describe and experimentally evaluate a probabilistic data integration approach which propagates the uncertainty resulting from content heterogeneity into statistical inference, avoiding the need to map to a common data model.
KW - Biomedical data harmonisation
KW - Content heterogeneity
KW - Missing data
KW - Probabilistic data integration
UR - http://www.scopus.com/inward/record.url?scp=85086929153&partnerID=8YFLogxK
U2 - 10.3233/SHTI200188
DO - 10.3233/SHTI200188
M3 - Conference contribution
C2 - 32570412
AN - SCOPUS:85086929153
T3 - Studies in Health Technology and Informatics
SP - 387
EP - 391
BT - Digital Personalized Health and Medicine - Proceedings of MIE 2020
A2 - Pape-Haugaard, Louise B.
A2 - Lovis, Christian
A2 - Madsen, Inge Cort
A2 - Weber, Patrick
A2 - Nielsen, Per Hostrup
A2 - Scott, Philip
PB - IOS Press
Y2 - 28 April 2020 through 1 May 2020
ER -