TY - JOUR
T1 - Over-the-Air Federated Learning from Heterogeneous Data
AU - Sery, Tomer
AU - Shlezinger, Nir
AU - Cohen, Kobi
AU - Eldar, Yonina
N1 - Funding Information:
Manuscript received September 25, 2020; revised February 28, 2021 and May 23, 2021; accepted June 9, 2021. Date of publication June 17, 2021; date of current version July 23, 2021. The associate editor coordinating the review of this manuscript and approving it for publication was Prof. Byonghyo Shim. This work was supported in part by the Benoziyo Endowment Fund for the Advancement of Science, the Estate of Olga Klein – Astrachan, in part by the European Union’s Horizon 2020 research, and innovation program under Grant 646804-ERC-COG-BNYQ, in part by the Israel Science Foundation under Grant 0100101, in part by the Israel Science Foundation under Grant 2640/20, and in part by the U.S.-Israel Binational Science Foundation (BSF) under Grant 2017723. A short version of this paper that introduces the algorithm for i.i.d. data and preliminary simulation results was accepted for presentation in the 2020 IEEE Global Communications Conference (GLOBECOM) [1]. (Corresponding author: Tomer Sery.) Tomer Sery, Nir Shlezinger, and Kobi Cohen are with the School of Electrical and Computer Engineering, Ben-Gurion University of the Negev, Beer-Sheva 4486200, Israel (e-mail: seryt@post.bgu.ac.il; nirshlezinger1@gmail.com; kobi.cohen10@gmail.com).
Publisher Copyright:
© 1991-2012 IEEE.
PY - 2021/1/1
Y1 - 2021/1/1
N2 - We focus on over-the-air (OTA) Federated Learning (FL), which has been suggested recently to reduce the communication overhead of FL due to the repeated transmissions of the model updates by a large number of users over the wireless channel. In OTA FL, all users simultaneously transmit their updates as analog signals over a multiple access channel, and the server receives a superposition of the analog transmitted signals. However, this approach results in the channel noise directly affecting the optimization procedure, which may degrade the accuracy of the trained model. We develop a Convergent OTA FL (COTAF) algorithm which enhances the common local stochastic gradient descent (SGD) FL algorithm, introducing precoding at the users and scaling at the server, which gradually mitigates the effect of noise. We analyze the convergence of COTAF to the loss minimizing model and quantify the effect of a statistically heterogeneous setup, i.e. when the training data of each user obeys a different distribution. Our analysis reveals the ability of COTAF to achieve a convergence rate similar to that achievable over error-free channels. Our simulations demonstrate the improved convergence of COTAF over vanilla OTA local SGD for training using non-synthetic datasets. Furthermore, we numerically show that the precoding induced by COTAF notably improves the convergence rate and the accuracy of models trained via OTA FL.
AB - We focus on over-the-air (OTA) Federated Learning (FL), which has been suggested recently to reduce the communication overhead of FL due to the repeated transmissions of the model updates by a large number of users over the wireless channel. In OTA FL, all users simultaneously transmit their updates as analog signals over a multiple access channel, and the server receives a superposition of the analog transmitted signals. However, this approach results in the channel noise directly affecting the optimization procedure, which may degrade the accuracy of the trained model. We develop a Convergent OTA FL (COTAF) algorithm which enhances the common local stochastic gradient descent (SGD) FL algorithm, introducing precoding at the users and scaling at the server, which gradually mitigates the effect of noise. We analyze the convergence of COTAF to the loss minimizing model and quantify the effect of a statistically heterogeneous setup, i.e. when the training data of each user obeys a different distribution. Our analysis reveals the ability of COTAF to achieve a convergence rate similar to that achievable over error-free channels. Our simulations demonstrate the improved convergence of COTAF over vanilla OTA local SGD for training using non-synthetic datasets. Furthermore, we numerically show that the precoding induced by COTAF notably improves the convergence rate and the accuracy of models trained via OTA FL.
KW - Machine learning
KW - gradient methods
KW - optimization
KW - wireless communication
UR - http://www.scopus.com/inward/record.url?scp=85111734633&partnerID=8YFLogxK
U2 - 10.1109/TSP.2021.3090323
DO - 10.1109/TSP.2021.3090323
M3 - Article
AN - SCOPUS:85111734633
VL - 69
SP - 3796
EP - 3811
JO - IEEE Transactions on Signal Processing
JF - IEEE Transactions on Signal Processing
SN - 1053-587X
M1 - 9459539
ER -