TY - JOUR
T1 - An evaluation of the capabilities of language models and nurses in providing neonatal clinical decision support
AU - Levin, Chedva
AU - Kagan, Tehilla
AU - Rosen, Shani
AU - Saban, Mor
N1 - Publisher Copyright:
© 2024 Elsevier Ltd
PY - 2024/7/1
Y1 - 2024/7/1
N2 - Aim: To assess the clinical reasoning capabilities of two large language models, ChatGPT-4 and Claude-2.0, compared to those of neonatal nurses during neonatal care scenarios. Design: A cross-sectional study with a comparative evaluation using a survey instrument that included six neonatal intensive care unit clinical scenarios. Participants: 32 neonatal intensive care nurses with 5–10 years of experience working in the neonatal intensive care units of three medical centers. Methods: Participants responded to 6 written clinical scenarios. Simultaneously, we asked ChatGPT-4 and Claude-2.0 to provide initial assessments and treatment recommendations for the same scenarios. The responses from ChatGPT-4 and Claude-2.0 were then scored by certified neonatal nurse practitioners for accuracy, completeness, and response time. Results: Both models demonstrated capabilities in clinical reasoning for neonatal care, with Claude-2.0 significantly outperforming ChatGPT-4 in clinical accuracy and speed. However, limitations were identified across the cases in diagnostic precision, treatment specificity, and response lag. Conclusions: While showing promise, current limitations reinforce the need for deep refinement before ChatGPT-4 and Claude-2.0 can be considered for integration into clinical practice. Additional validation of these tools is important to safely leverage this Artificial Intelligence technology for enhancing clinical decision-making. Impact: The study provides an understanding of the reasoning accuracy of new Artificial Intelligence models in neonatal clinical care. The current accuracy gaps of ChatGPT-4 and Claude-2.0 need to be addressed prior to clinical usage.
AB - Aim: To assess the clinical reasoning capabilities of two large language models, ChatGPT-4 and Claude-2.0, compared to those of neonatal nurses during neonatal care scenarios. Design: A cross-sectional study with a comparative evaluation using a survey instrument that included six neonatal intensive care unit clinical scenarios. Participants: 32 neonatal intensive care nurses with 5–10 years of experience working in the neonatal intensive care units of three medical centers. Methods: Participants responded to 6 written clinical scenarios. Simultaneously, we asked ChatGPT-4 and Claude-2.0 to provide initial assessments and treatment recommendations for the same scenarios. The responses from ChatGPT-4 and Claude-2.0 were then scored by certified neonatal nurse practitioners for accuracy, completeness, and response time. Results: Both models demonstrated capabilities in clinical reasoning for neonatal care, with Claude-2.0 significantly outperforming ChatGPT-4 in clinical accuracy and speed. However, limitations were identified across the cases in diagnostic precision, treatment specificity, and response lag. Conclusions: While showing promise, current limitations reinforce the need for deep refinement before ChatGPT-4 and Claude-2.0 can be considered for integration into clinical practice. Additional validation of these tools is important to safely leverage this Artificial Intelligence technology for enhancing clinical decision-making. Impact: The study provides an understanding of the reasoning accuracy of new Artificial Intelligence models in neonatal clinical care. The current accuracy gaps of ChatGPT-4 and Claude-2.0 need to be addressed prior to clinical usage.
KW - Artificial Intelligence
KW - ChatGPT
KW - Claude
KW - Clinical reasoning
KW - Neonatal care
UR - http://www.scopus.com/inward/record.url?scp=85191301315&partnerID=8YFLogxK
U2 - 10.1016/j.ijnurstu.2024.104771
DO - 10.1016/j.ijnurstu.2024.104771
M3 - Article
C2 - 38688103
AN - SCOPUS:85191301315
SN - 0020-7489
VL - 155
JO - International Journal of Nursing Studies
JF - International Journal of Nursing Studies
M1 - 104771
ER -