TY - JOUR
T1 - Can we identify individuals at risk to develop multiple myeloma? A machine learning-based predictive model
AU - Mittelman, Moshe
AU - Israel, Ariel
AU - Oster, Howard S.
AU - Leshchinsky, Michael
AU - Ben-Shlomo, Yatir
AU - Kepten, Eldad
AU - Dolberg, Osnat Jarchowsky
AU - Balicer, Ran
AU - Shaham, Galit
N1 - Publisher Copyright:
© 2025 The Author(s). British Journal of Haematology published by British Society for Haematology and John Wiley & Sons Ltd.
PY - 2025/1/1
Y1 - 2025/1/1
N2 - Multiple myeloma evolves unnoticed over years, and when diagnosed, organ damage is common. Electronic health records (EHR) can help in developing predictive models identifying ‘healthy’ people at risk. MM patients from Clalit Health Services (2002–2019) were matched with healthy controls. Stage I: EHR from 5 years prior to MM diagnosis were reviewed and >200 parameters were compared (patients vs. controls). Stage II: Establishing xgboost model predicting 5 year risk for MM, with validation. Stage III: A simplified logistic regression model for community, requiring 20 variables (Age; Hb; RBC; MCV; RDW; WBC; neutrophils; lymphocytes; monocytes; basophils; glucose; creatinine; total protein; albumin; calcium; uric acid; bilirubin; HDL-C; LDL-C; triglycerides). EHR from the pre-MM period of 4256 patients were compared to controls. Future MM patients had higher ESR, lower Hb, ANC, neutrophil/lymphocyte ratio, higher globulins and ferritin, more immune deficiencies, MDS and FMF. They took fewer tranquilizers, anti-diabetics and statins. Using labs from future MM (n = 19 129) and controls (n = 382 580, 20:1), a predictive model was developed (ROC AUC = 0.836). The simple LR model provided individual risk prediction for MM within 5 years (AUC = 0.72). Two models with machine learning predict the risk of myeloma in ‘healthy’ individuals within 5 years. The models can be used in practice.
AB - Multiple myeloma evolves unnoticed over years, and when diagnosed, organ damage is common. Electronic health records (EHR) can help in developing predictive models identifying ‘healthy’ people at risk. MM patients from Clalit Health Services (2002–2019) were matched with healthy controls. Stage I: EHR from 5 years prior to MM diagnosis were reviewed and >200 parameters were compared (patients vs. controls). Stage II: Establishing xgboost model predicting 5 year risk for MM, with validation. Stage III: A simplified logistic regression model for community, requiring 20 variables (Age; Hb; RBC; MCV; RDW; WBC; neutrophils; lymphocytes; monocytes; basophils; glucose; creatinine; total protein; albumin; calcium; uric acid; bilirubin; HDL-C; LDL-C; triglycerides). EHR from the pre-MM period of 4256 patients were compared to controls. Future MM patients had higher ESR, lower Hb, ANC, neutrophil/lymphocyte ratio, higher globulins and ferritin, more immune deficiencies, MDS and FMF. They took fewer tranquilizers, anti-diabetics and statins. Using labs from future MM (n = 19 129) and controls (n = 382 580, 20:1), a predictive model was developed (ROC AUC = 0.836). The simple LR model provided individual risk prediction for MM within 5 years (AUC = 0.72). Two models with machine learning predict the risk of myeloma in ‘healthy’ individuals within 5 years. The models can be used in practice.
KW - computer modelling
KW - disease prediction
KW - gradient boosted
KW - logistic regression
KW - multiple myeloma
UR - http://www.scopus.com/inward/record.url?scp=105008439352&partnerID=8YFLogxK
U2 - 10.1111/bjh.20136
DO - 10.1111/bjh.20136
M3 - Article
C2 - 40524461
AN - SCOPUS:105008439352
SN - 0007-1048
JO - British Journal of Haematology
JF - British Journal of Haematology
ER -