TY - GEN
T1 - Automatic and Simultaneous Adjustment of Learning Rate and Momentum for Stochastic Gradient-based Optimization Methods
AU - Lancewicki, Tomer
AU - Kopru, Selcuk
N1 - Publisher Copyright:
© 2020 IEEE.
PY - 2020/5/1
Y1 - 2020/5/1
N2 - Stochastic gradient-based methods are prominent for training machine learning and deep learning models. The performance of these techniques depends on their hyperparameter tuning over time and varies for different models and problems. Manual adjustment of hyperparameters is very costly and time-consuming, and even if done correctly, it lacks theoretical justification which inevitably leads to "rule of thumb" settings. In this paper, we propose a generic approach that utilizes the statistics of an unbiased gradient estimator to automatically and simultaneously adjust two paramount hyperparameters: the learning rate and momentum. We deploy the proposed general technique for various stochastic gradient-based methods to train Convolutional Neural Networks (CNN's). The results match the performance of the best settings obtained through an exhaustive search and therefore, removes the need for a tedious manual tuning.
AB - Stochastic gradient-based methods are prominent for training machine learning and deep learning models. The performance of these techniques depends on their hyperparameter tuning over time and varies for different models and problems. Manual adjustment of hyperparameters is very costly and time-consuming, and even if done correctly, it lacks theoretical justification which inevitably leads to "rule of thumb" settings. In this paper, we propose a generic approach that utilizes the statistics of an unbiased gradient estimator to automatically and simultaneously adjust two paramount hyperparameters: the learning rate and momentum. We deploy the proposed general technique for various stochastic gradient-based methods to train Convolutional Neural Networks (CNN's). The results match the performance of the best settings obtained through an exhaustive search and therefore, removes the need for a tedious manual tuning.
KW - estimation theory
KW - stochastic gradient descent
KW - stochastic optimization
KW - time varying optimization
KW - variance reduction
UR - https://www.scopus.com/pages/publications/85089226933
U2 - 10.1109/ICASSP40776.2020.9053316
DO - 10.1109/ICASSP40776.2020.9053316
M3 - Conference contribution
AN - SCOPUS:85089226933
T3 - ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
SP - 3127
EP - 3131
BT - 2020 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2020 - Proceedings
PB - Institute of Electrical and Electronics Engineers
T2 - 2020 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2020
Y2 - 4 May 2020 through 8 May 2020
ER -