Objective: The objective of this study was to evaluate the incremental predictive power of electronic medical record (EMR) data, relative to the information available in more easily accessible and standardized insurance claims data. Data and Methods: Using both EMR and Claims data, we predicted outcomes for 118,510 patients with 144,966 hospitalizations in 8 hospitals, using widely used prediction models. We use cross-validation to prevent overfitting and tested predictive performance on separate data that were not used for model training. Main Outcomes: We predict 4 binary outcomes: length of stay (≥7 d), death during the index admission, 30-day readmission, and 1-year mortality. Results: We achieve nearly the same prediction accuracy using both EMR and claims data relative to using claims data alone in predicting 30-day readmissions [area under the receiver operating characteristic curve (AUC): 0.698 vs. 0.711; positive predictive value (PPV) at top 10% of predicted risk: 37.2% vs. 35.7%], and 1-year mortality (AUC: 0.902 vs. 0.912; PPV: 64.6% vs. 57.6%). EMR data, especially from the first 2 days of the index admission, substantially improved prediction of length of stay (AUC: 0.786 vs. 0.837; PPV: 58.9% vs. 55.5%) and inpatient mortality (AUC: 0.897 vs. 0.950; PPV: 24.3% vs. 14.0%). Results were similar for sensitivity, specificity, and negative predictive value across alternative cutoffs and for using alternative types of predictive models. Conclusion: EMR data are useful in predicting short-term outcomes. However, their incremental value for predicting longer-term outcomes is smaller. Therefore, for interventions that are based on long-term predictions, using more broadly available claims data is equally effective.
- electronic medical records
- predictive modeling