TY - GEN
T1 - Predicting Data Scientist Stuckness During the Development of Machine Learning Classifiers
AU - Mash, Moshe
AU - Oryol, Shoshana
AU - Simmons, Reid
AU - Rosenthal, Stephanie
N1 - Publisher Copyright:
© 2022 IEEE Computer Society. All rights reserved.
PY - 2022/1/1
Y1 - 2022/1/1
N2 - The success of data scientists in developing machine learning models is contingent on an iterative development process for detecting patterns in data, finding and extracting useful features, and maximizing their model's performance. However, it is often the case that they struggle during model development and become stuck and unable to make significant progress. We collected qualitative and quantitative data from the workflow of data scientists that allow us to learn from and examine such moments of stuckness. We used this data to develop a model for predicting stuckness based on real-time indicators, such as code artifacts, and then used the model to develop an innovative algorithm that determines precisely when a potential stuckness intervention should occur: as close as possible to the beginning of actual stuckness. Our algorithm's performance indicates the potential efficacy of predicting data scientist stuckness algorithmically under real-world circumstances and for real-world needs.
AB - The success of data scientists in developing machine learning models is contingent on an iterative development process for detecting patterns in data, finding and extracting useful features, and maximizing their model's performance. However, it is often the case that they struggle during model development and become stuck and unable to make significant progress. We collected qualitative and quantitative data from the workflow of data scientists that allow us to learn from and examine such moments of stuckness. We used this data to develop a model for predicting stuckness based on real-time indicators, such as code artifacts, and then used the model to develop an innovative algorithm that determines precisely when a potential stuckness intervention should occur: as close as possible to the beginning of actual stuckness. Our algorithm's performance indicates the potential efficacy of predicting data scientist stuckness algorithmically under real-world circumstances and for real-world needs.
UR - http://www.scopus.com/inward/record.url?scp=85148713000&partnerID=8YFLogxK
U2 - 10.1109/VL/HCC53370.2022.9833124
DO - 10.1109/VL/HCC53370.2022.9833124
M3 - Conference contribution
AN - SCOPUS:85148713000
T3 - Proceedings of IEEE Symposium on Visual Languages and Human-Centric Computing, VL/HCC
BT - Proceedings - 2022 IEEE Symposium on Visual Languages and Human-Centric Computing, VL/HCC 2022
A2 - Bottoni, Paolo
A2 - Costagliola, Gennaro
A2 - Brachman, Michelle
A2 - Minas, Mark
PB - Institute of Electrical and Electronics Engineers
T2 - 2022 IEEE Symposium on Visual Languages and Human-Centric Computing, VL/HCC 2022
Y2 - 12 September 2022 through 16 September 2022
ER -