Predicting Data Scientist Stuckness During the Development of Machine Learning Classifiers

Moshe Mash, Shoshana Oryol, Reid Simmons, Stephanie Rosenthal

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

The success of data scientists in developing machine learning models is contingent on an iterative development process for detecting patterns in data, finding and extracting useful features, and maximizing their model's performance. However, it is often the case that they struggle during model development and become stuck and unable to make significant progress. We collected qualitative and quantitative data from the workflow of data scientists that allow us to learn from and examine such moments of stuckness. We used this data to develop a model for predicting stuckness based on real-time indicators, such as code artifacts, and then used the model to develop an innovative algorithm that determines precisely when a potential stuckness intervention should occur: as close as possible to the beginning of actual stuckness. Our algorithm's performance indicates the potential efficacy of predicting data scientist stuckness algorithmically under real-world circumstances and for real-world needs.

Original languageEnglish
Title of host publicationProceedings - 2022 IEEE Symposium on Visual Languages and Human-Centric Computing, VL/HCC 2022
EditorsPaolo Bottoni, Gennaro Costagliola, Michelle Brachman, Mark Minas
PublisherInstitute of Electrical and Electronics Engineers
ISBN (Electronic)9781665442145
DOIs
StatePublished - 1 Jan 2022
Externally publishedYes
Event2022 IEEE Symposium on Visual Languages and Human-Centric Computing, VL/HCC 2022 - Rome, Italy
Duration: 12 Sep 202216 Sep 2022

Publication series

NameProceedings of IEEE Symposium on Visual Languages and Human-Centric Computing, VL/HCC
Volume2022-September
ISSN (Print)1943-6092
ISSN (Electronic)1943-6106

Conference

Conference2022 IEEE Symposium on Visual Languages and Human-Centric Computing, VL/HCC 2022
Country/TerritoryItaly
CityRome
Period12/09/2216/09/22

ASJC Scopus subject areas

  • Computational Theory and Mathematics
  • Human-Computer Interaction
  • Software

Fingerprint

Dive into the research topics of 'Predicting Data Scientist Stuckness During the Development of Machine Learning Classifiers'. Together they form a unique fingerprint.

Cite this