TY - GEN
T1 - Stream Iterative Distributed Coded Computing for Learning Applications in Heterogeneous Systems
AU - Esfahanizadeh, Homa
AU - Cohen, Alejandro
AU - Medard, Muriel
N1 - Publisher Copyright:
© 2022 IEEE.
PY - 2022/1/1
Y1 - 2022/1/1
N2 - To improve the utility of learning applications and render machine learning solutions feasible for complex applications, a substantial amount of heavy computations is needed. Thus, it is essential to delegate the computations among several workers, which brings up the major challenge of coping with delays and failures caused by the system's heterogeneity and uncertainties. In particular, minimizing the end-to-end job in-order execution delay, from arrival to delivery, is of great importance for real-world delay-sensitive applications. In this paper, for computation of each job iteration in a stochastic heterogeneous distributed system where the workers vary in their computing and communicating powers, we present a novel joint scheduling-coding framework that optimally split the coded computational load among the workers. This closes the gap between the workers' response time, and is critical to maximize the resource utilization. To further reduce the in-order execution delay, we also incorporate redundant computations in each iteration of a distributed computational job. Our simulation results demonstrate that the delay obtained using the proposed solution is dramatically lower than the uniform split which is oblivious to the system's heterogeneity and, in fact, is very close to an ideal lower bound just by introducing a small percentage of redundant computations.
AB - To improve the utility of learning applications and render machine learning solutions feasible for complex applications, a substantial amount of heavy computations is needed. Thus, it is essential to delegate the computations among several workers, which brings up the major challenge of coping with delays and failures caused by the system's heterogeneity and uncertainties. In particular, minimizing the end-to-end job in-order execution delay, from arrival to delivery, is of great importance for real-world delay-sensitive applications. In this paper, for computation of each job iteration in a stochastic heterogeneous distributed system where the workers vary in their computing and communicating powers, we present a novel joint scheduling-coding framework that optimally split the coded computational load among the workers. This closes the gap between the workers' response time, and is critical to maximize the resource utilization. To further reduce the in-order execution delay, we also incorporate redundant computations in each iteration of a distributed computational job. Our simulation results demonstrate that the delay obtained using the proposed solution is dramatically lower than the uniform split which is oblivious to the system's heterogeneity and, in fact, is very close to an ideal lower bound just by introducing a small percentage of redundant computations.
KW - coded computation
KW - distributed systems
KW - heterogeneous
KW - scheduling
KW - straggler
UR - http://www.scopus.com/inward/record.url?scp=85133256607&partnerID=8YFLogxK
U2 - 10.1109/INFOCOM48880.2022.9796977
DO - 10.1109/INFOCOM48880.2022.9796977
M3 - Conference contribution
AN - SCOPUS:85133256607
T3 - Proceedings - IEEE INFOCOM
SP - 230
EP - 239
BT - INFOCOM 2022 - IEEE Conference on Computer Communications
PB - Institute of Electrical and Electronics Engineers
T2 - 41st IEEE Conference on Computer Communications, INFOCOM 2022
Y2 - 2 May 2022 through 5 May 2022
ER -