Machine learning for improving high-dimensional proxy confounder adjustment in healthcare database studies: An overview of the current literature

Richard Wyss, Chen Yanover, Tal El-Hay, Dimitri Bennett, Robert W. Platt, Andrew R. Zullo, Grammati Sari, Xuerong Wen, Yizhou Ye, Hongbo Yuan, Mugdha Gokhale, Elisabetta Patorno, Kueiyu Joshua Lin

Research output: Contribution to journalReview articlepeer-review

13 Scopus citations

Abstract

Purpose: Supplementing investigator-specified variables with large numbers of empirically identified features that collectively serve as ‘proxies’ for unspecified or unmeasured factors can often improve confounding control in studies utilizing administrative healthcare databases. Consequently, there has been a recent focus on the development of data-driven methods for high-dimensional proxy confounder adjustment in pharmacoepidemiologic research. In this paper, we survey current approaches and recent advancements for high-dimensional proxy confounder adjustment in healthcare database studies. Methods: We discuss considerations underpinning three areas for high-dimensional proxy confounder adjustment: (1) feature generation—transforming raw data into covariates (or features) to be used for proxy adjustment; (2) covariate prioritization, selection, and adjustment; and (3) diagnostic assessment. We discuss challenges and avenues of future development within each area. Results: There is a large literature on methods for high-dimensional confounder prioritization/selection, but relatively little has been written on best practices for feature generation and diagnostic assessment. Consequently, these areas have particular limitations and challenges. Conclusions: There is a growing body of evidence showing that machine-learning algorithms for high-dimensional proxy-confounder adjustment can supplement investigator-specified variables to improve confounding control compared to adjustment based on investigator-specified variables alone. However, more research is needed on best practices for feature generation and diagnostic assessment when applying methods for high-dimensional proxy confounder adjustment in pharmacoepidemiologic studies.

Original languageEnglish
Pages (from-to)932-943
Number of pages12
JournalPharmacoepidemiology and Drug Safety
Volume31
Issue number9
DOIs
StatePublished - 1 Sep 2022
Externally publishedYes

Keywords

  • causal inference
  • confounding
  • machine learning

ASJC Scopus subject areas

  • Epidemiology
  • Pharmacology (medical)

Fingerprint

Dive into the research topics of 'Machine learning for improving high-dimensional proxy confounder adjustment in healthcare database studies: An overview of the current literature'. Together they form a unique fingerprint.

Cite this