TY - GEN
T1 - Domain adaptation from clinical trials data to the tertiary care clinic - Application to ALS
AU - Hadad, Ben
AU - Lerner, Boaz
N1 - Publisher Copyright:
© 2020 IEEE.
PY - 2020/12/1
Y1 - 2020/12/1
N2 - Amyotrophic lateral sclerosis (ALS) is a devastating and incurable disease affecting motor neurons, leading to progressive paralysis and death on average within three to five years from onset. The disease is characterized by highly variable patterns and rates of progression, which pose challenges to developing reliable and accurate ALS disease state prediction models to be used on a daily basis in clinics with little data. To meet these challenges, we suggest domain adaptation from a large, but unfortunately biased, clinical trials database to that of a tertiary care ALS clinic. To evaluate the reliability and accuracy of the suggested paradigm, we examine a naïve approach by which training is based only on the clinical trials data compared with a domain adaptation approach of an initial training using this same data followed by fine-tuning training using the clinic data. We also allow summarization of the clinical longitudinal data to evaluate non-temporal models, e.g., random forest (RF), XGBoost (XGB), and multilayer perceptron (MLP), partially exploiting the dynamic information hidden in patient clinical records, in comparison to the long short-term memory (LSTM) recurrent neural network, fully exploiting the temporal information in the data. First, we notice the XGB outperformance in terms of the ALS disease state prediction error to the RF and MLP, but surprisingly also to the LSTM regardless of prediction time (up to 24 months ahead). We contribute the inferiority of the highly parametrized neural network to the impact of the curse of dimensionality. Second, we show that this error does not significantly increase when the model is trained using only the clinical trials data, especially for LSTM in long prediction times. Finally, we demonstrate that fine-tuning of the clinical trials-based pre-trained model using the clinic data improves the LSTM and MLP performance compared to using solely the clinical trials or clinic data.
AB - Amyotrophic lateral sclerosis (ALS) is a devastating and incurable disease affecting motor neurons, leading to progressive paralysis and death on average within three to five years from onset. The disease is characterized by highly variable patterns and rates of progression, which pose challenges to developing reliable and accurate ALS disease state prediction models to be used on a daily basis in clinics with little data. To meet these challenges, we suggest domain adaptation from a large, but unfortunately biased, clinical trials database to that of a tertiary care ALS clinic. To evaluate the reliability and accuracy of the suggested paradigm, we examine a naïve approach by which training is based only on the clinical trials data compared with a domain adaptation approach of an initial training using this same data followed by fine-tuning training using the clinic data. We also allow summarization of the clinical longitudinal data to evaluate non-temporal models, e.g., random forest (RF), XGBoost (XGB), and multilayer perceptron (MLP), partially exploiting the dynamic information hidden in patient clinical records, in comparison to the long short-term memory (LSTM) recurrent neural network, fully exploiting the temporal information in the data. First, we notice the XGB outperformance in terms of the ALS disease state prediction error to the RF and MLP, but surprisingly also to the LSTM regardless of prediction time (up to 24 months ahead). We contribute the inferiority of the highly parametrized neural network to the impact of the curse of dimensionality. Second, we show that this error does not significantly increase when the model is trained using only the clinical trials data, especially for LSTM in long prediction times. Finally, we demonstrate that fine-tuning of the clinical trials-based pre-trained model using the clinic data improves the LSTM and MLP performance compared to using solely the clinical trials or clinic data.
KW - Amyotrophic lateral sclerosis (ALS)
KW - clinical trials data
KW - disease-state prediction
KW - domain adaptation
KW - LSTM
UR - http://www.scopus.com/inward/record.url?scp=85102504760&partnerID=8YFLogxK
U2 - 10.1109/ICMLA51294.2020.00090
DO - 10.1109/ICMLA51294.2020.00090
M3 - Conference contribution
AN - SCOPUS:85102504760
T3 - Proceedings - 19th IEEE International Conference on Machine Learning and Applications, ICMLA 2020
SP - 539
EP - 544
BT - Proceedings - 19th IEEE International Conference on Machine Learning and Applications, ICMLA 2020
A2 - Wani, M. Arif
A2 - Luo, Feng
A2 - Li, Xiaolin
A2 - Dou, Dejing
A2 - Bonchi, Francesco
PB - Institute of Electrical and Electronics Engineers
T2 - 19th IEEE International Conference on Machine Learning and Applications, ICMLA 2020
Y2 - 14 December 2020 through 17 December 2020
ER -