TY - JOUR
T1 - On Analog Gradient Descent Learning over Multiple Access Fading Channels
AU - Sery, Tomer
AU - Cohen, Kobi
N1 - Funding Information:
Manuscript received August 19, 2019; revised January 6, 2020 and April 10, 2020; accepted April 10, 2020. Date of publication April 22, 2020; date of current version May 18, 2020. The associate editor coordinating the review of this manuscript and approving it for publication was Dr. Soummya Kar. This work was supported by the U.S.-Israel Binational Science Foundation (BSF) under Grant 2017723. This paper was presented at the 57th Annual Allerton Conference on Communication, Control, and Computing, 2019. In this journal version we include: (i) a deep theoretical analysis of the algorithm with detailed proofs; (ii) a detailed discussion of the results, including error characterization, energy scaling laws of the system, and a theoretical comparison with centralized gradient-descent type algorithms; and (iii) much more extensive simulation results. (Corresponding author: Kobi Cohen.) The authors are with the School of Electrical and Computer Engineering, Ben-Gurion University of the Negev, Beer Sheva 8410501, Israel (e-mail: seryt@post.bgu.ac.il; yakovsec@bgu.ac.il). Digital Object Identifier 10.1109/TSP.2020.2989580
Publisher Copyright:
© 1991-2012 IEEE.
PY - 2020/1/1
Y1 - 2020/1/1
N2 - We consider a distributed learning problem over multiple access channel (MAC) using a large wireless network. The computation is made by the network edge and is based on received data from a large number of distributed nodes which transmit over a noisy fading MAC. The objective function is a sum of the nodes' local loss functions. This problem has attracted a growing interest in distributed sensing systems, and more recently in federated learning. We develop a novel Gradient-Based Multiple Access (GBMA) algorithm to solve the distributed learning problem over MAC. Specifically, the nodes transmit an analog function of the local gradient using common shaping waveforms and the network edge receives a superposition of the analog transmitted signals used for updating the estimate. GBMA does not require power control or beamforming to cancel the fading effect as in other algorithms, and operates directly with noisy distorted gradients. We analyze the performance of GBMA theoretically, and prove that it can approach the convergence rate of the centralized gradient descent (GD) algorithm in large networks. Specifically, we establish a finite-sample bound of the error for both convex and strongly convex loss functions with Lipschitz gradient. Furthermore, we provide energy scaling laws for approaching the centralized convergence rate as the number of nodes increases. Finally, experimental results support the theoretical findings, and demonstrate strong performance of GBMA using synthetic and real data.
AB - We consider a distributed learning problem over multiple access channel (MAC) using a large wireless network. The computation is made by the network edge and is based on received data from a large number of distributed nodes which transmit over a noisy fading MAC. The objective function is a sum of the nodes' local loss functions. This problem has attracted a growing interest in distributed sensing systems, and more recently in federated learning. We develop a novel Gradient-Based Multiple Access (GBMA) algorithm to solve the distributed learning problem over MAC. Specifically, the nodes transmit an analog function of the local gradient using common shaping waveforms and the network edge receives a superposition of the analog transmitted signals used for updating the estimate. GBMA does not require power control or beamforming to cancel the fading effect as in other algorithms, and operates directly with noisy distorted gradients. We analyze the performance of GBMA theoretically, and prove that it can approach the convergence rate of the centralized gradient descent (GD) algorithm in large networks. Specifically, we establish a finite-sample bound of the error for both convex and strongly convex loss functions with Lipschitz gradient. Furthermore, we provide energy scaling laws for approaching the centralized convergence rate as the number of nodes increases. Finally, experimental results support the theoretical findings, and demonstrate strong performance of GBMA using synthetic and real data.
KW - Distributed learning
KW - federated learning
KW - gradient descent
KW - gradient methods
KW - multiple access channel (MAC)
KW - optimization
UR - http://www.scopus.com/inward/record.url?scp=85085607842&partnerID=8YFLogxK
U2 - 10.1109/TSP.2020.2989580
DO - 10.1109/TSP.2020.2989580
M3 - Article
AN - SCOPUS:85085607842
VL - 68
SP - 2897
EP - 2911
JO - IEEE Transactions on Signal Processing
JF - IEEE Transactions on Signal Processing
SN - 1053-587X
M1 - 9076343
ER -