Subpopulation-specific synthetic electronic health records can increase mortality prediction performance

Oriel Perets, Nadav Rappoport

Research output: Contribution to journalArticlepeer-review

Abstract

Objective To address biased representation in Electronic Health Records (EHRs) across subpopulations (SPs), which leads to predictive models underperforming for underrepresented groups, we propose a framework to enhance equitable predictive performance. Materials and Methods We developed a framework using generative adversarial networks (GANs) to create SP-specific synthetic data, which augments the original training datasets. Subsequently, we employed an ensemble approach, training distinct prediction models tailored to each SP. Results The proposed framework was evaluated on two datasets derived from the MIMIC database, achieving a performance improvement in Receiver Operating Characteristics Area Under Curve (ROCAUC) ranging from 8% to 31% for underrepresented SPs. Discussion The results indicate that targeted synthetic data augmentation and SP-specific model training significantly mitigate the performance disparities observed in conventional predictive models trained on imbalanced EHR data. Conclusion Our novel GAN-based framework, combined with an ensemble prediction approach, effectively enhances predictive equity across SPs. The code and ensemble models developed in this study are publicly available, supporting further research and practical adoption of equitable predictive analytics in healthcare.

Original languageEnglish
Article numberooaf091
JournalJAMIA Open
Volume8
Issue number4
DOIs
StatePublished - 1 Aug 2025

Keywords

  • electronic health records
  • generative adversarial networks
  • mortality prediction
  • subpopulation health
  • synthetic data

ASJC Scopus subject areas

  • Health Informatics

Fingerprint

Dive into the research topics of 'Subpopulation-specific synthetic electronic health records can increase mortality prediction performance'. Together they form a unique fingerprint.

Cite this