Stream Iterative Distributed Coded Computing for Learning Applications in Heterogeneous Systems

Homa Esfahanizadeh, Alejandro Cohen, Muriel Medard

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

6 Scopus citations

Abstract

To improve the utility of learning applications and render machine learning solutions feasible for complex applications, a substantial amount of heavy computations is needed. Thus, it is essential to delegate the computations among several workers, which brings up the major challenge of coping with delays and failures caused by the system's heterogeneity and uncertainties. In particular, minimizing the end-to-end job in-order execution delay, from arrival to delivery, is of great importance for real-world delay-sensitive applications. In this paper, for computation of each job iteration in a stochastic heterogeneous distributed system where the workers vary in their computing and communicating powers, we present a novel joint scheduling-coding framework that optimally split the coded computational load among the workers. This closes the gap between the workers' response time, and is critical to maximize the resource utilization. To further reduce the in-order execution delay, we also incorporate redundant computations in each iteration of a distributed computational job. Our simulation results demonstrate that the delay obtained using the proposed solution is dramatically lower than the uniform split which is oblivious to the system's heterogeneity and, in fact, is very close to an ideal lower bound just by introducing a small percentage of redundant computations.

Original languageEnglish
Title of host publicationINFOCOM 2022 - IEEE Conference on Computer Communications
PublisherInstitute of Electrical and Electronics Engineers
Pages230-239
Number of pages10
ISBN (Electronic)9781665458221
DOIs
StatePublished - 1 Jan 2022
Externally publishedYes
Event41st IEEE Conference on Computer Communications, INFOCOM 2022 - Virtual, Online, United Kingdom
Duration: 2 May 20225 May 2022

Publication series

NameProceedings - IEEE INFOCOM
Volume2022-May
ISSN (Print)0743-166X

Conference

Conference41st IEEE Conference on Computer Communications, INFOCOM 2022
Country/TerritoryUnited Kingdom
CityVirtual, Online
Period2/05/225/05/22

Keywords

  • coded computation
  • distributed systems
  • heterogeneous
  • scheduling
  • straggler

ASJC Scopus subject areas

  • General Computer Science
  • Electrical and Electronic Engineering

Fingerprint

Dive into the research topics of 'Stream Iterative Distributed Coded Computing for Learning Applications in Heterogeneous Systems'. Together they form a unique fingerprint.

Cite this