Federated learning (FL) is a framework for distributed learning of centralized models. In FL, a set of edge devices train a model using their local data, while repeatedly exchanging their trained model with a central server, allowing to tune a global model without having the users share their possibly private data. A major challenge in FL is to reduce the bandwidth and energy consumption due to the repeated transmissions of large volumes of data by a large number of users over the wireless channel. Recently, over-the-air (OTA) FL has been suggested to achieve this goal. In this setting, all users transmit their data signal simultaneously over a Multiple Access Channel (MAC), and the computation is done over the wireless channel. In this paper, we develop a novel convergent OTA FL (COTAF) algorithm, which induces precoding and scaling upon transmissions to gradually mitigate the effect of the noisy channel, thus facilitating FL convergence. We analyze the convergence of COTAF to the loss minimizing model theoretically, showing its ability to achieve a convergence rate similar to that achievable over error-free channels. Our simulations demonstrate the improved convergence of COTAF for training using non-synthetic datasets.