Outcomes prediction in longitudinal data: Study designs evaluation, use case in ICU acquired sepsis

Maya Schvetz, Lior Fuchs, Victor Novack, Robert Moskovitch

Research output: Contribution to journalArticlepeer-review

8 Scopus citations


Outcomes’ prediction in Electronic Health Records (EHR) and specifically in Critical Care is increasingly attracting more exploration and research. In this study, we used clinical data from the Intensive Care Unit (ICU), focusing on ICU acquired sepsis. Looking at the current literature, several evaluation approaches are reported, inspired by epidemiological designs, in which some do not always reflect real-life application's conditions. This problem seems relevant generally to outcomes’ prediction in longitudinal EHR data, or generally longitudinal data, while in this study we focused on ICU data. Unlike in most previous studies that investigated all sepsis admissions, we focused specifically on ICU-Acquired Sepsis. Due to the sparse nature of the longitudinal data, we employed the use of Temporal Abstraction and Time Interval-Related Patterns discovery, which are further used as classification features. Two experiments were designed using three different outcomes prediction study designs from the literature, implementing various levels of real-life conditions to evaluate the prediction models. The first experiment focused on predicting whether a patient would suffer from ICU-acquired sepsis and when during her admission, given a sliding observation time window, and the comparison of the three study designs behavior. The second experiment focused only on predicting whether the patient will suffer from ICU-acquired sepsis, based on data taken relatively to his admission start time. Our results show that using Temporal Discretization for Classification (TD4C) led to better performance than using the Equal-Width Discretization, Knowledge-Based, or SAX. Also, using two states abstraction was better than three or four. Using the default Binary TIRP representation method performed better than Mean Duration, Horizontal Support, and horizontally normalized horizontal support. Using XGBoost as a classifier performed better than Logistic Regression, Neural Net, or Random Forest. Additionally, it is demonstrated why the use of case-crossover-control is most appropriate for real life application conditions evaluation, unlike other incomplete designs that may even result in “better performance”.

Original languageEnglish
Article number103734
JournalJournal of Biomedical Informatics
StatePublished - 1 May 2021


  • Classification
  • ICU-acquired sepsis
  • Study design
  • Temporal data mining
  • Temporal patterns discovery
  • Time intervals mining

ASJC Scopus subject areas

  • Computer Science Applications
  • Health Informatics


Dive into the research topics of 'Outcomes prediction in longitudinal data: Study designs evaluation, use case in ICU acquired sepsis'. Together they form a unique fingerprint.

Cite this