DSWorkFlow: A Framework for Capturing Data Scientists' Workflows

Moshe Mash, Stephanie Rosenthal, Reid Simmons

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

3 Scopus citations

Abstract

While machine learning algorithms continue to improve, their success often relies upon the data scientists' ability to detect patterns, determine useful features and visualizations, select good models, and evaluate and iterate upon results. Data scientists often spend a long time making very little progress as they struggle to determine how to proceed. In this respect, the understanding of data scientists' workflows and challenges has recently attracted a great deal of scholarly interest. However, the literature is mostly based on interviews and qualitative research methodologies. With this in mind, we developed DSWorkFlow, a data collection framework that provides researchers with the ability to observe and analyze data scientists' cognitive workflows as they develop predictive models. Using DSWorkFlow, researchers can collect data from a Jupyter Notebook, to reconstruct the code execution order and extract relevant information about data scientist workflow alongside the concomitant collection of qualitative data. We tested the framework experimentally with seven data scientists as they each created three machine learning models to inform our extraction algorithms.

Original languageEnglish
Title of host publicationExtended Abstracts of the 2021 CHI Conference on Human Factors in Computing Systems, CHI EA 2021
PublisherAssociation for Computing Machinery
ISBN (Electronic)9781450380959
DOIs
StatePublished - 8 May 2021
Externally publishedYes
Event2021 CHI Conference on Human Factors in Computing Systems: Making Waves, Combining Strengths, CHI EA 2021 - Virtual, Online, Japan
Duration: 8 May 202113 May 2021

Publication series

NameConference on Human Factors in Computing Systems - Proceedings

Conference

Conference2021 CHI Conference on Human Factors in Computing Systems: Making Waves, Combining Strengths, CHI EA 2021
Country/TerritoryJapan
CityVirtual, Online
Period8/05/2113/05/21

Keywords

  • data science process
  • workflow analysis
  • workflow extraction

ASJC Scopus subject areas

  • Human-Computer Interaction
  • Computer Graphics and Computer-Aided Design
  • Software

Fingerprint

Dive into the research topics of 'DSWorkFlow: A Framework for Capturing Data Scientists' Workflows'. Together they form a unique fingerprint.

Cite this