TY - GEN
T1 - Spurious local minima are common in two-layer ReLU neural networks
AU - Safran, Itay
AU - Shamir, Ohad
N1 - Publisher Copyright:
© 35th International Conference on Machine Learning, ICML 2018.All Rights Reserved.
PY - 2018/1/1
Y1 - 2018/1/1
N2 - We consider the optimization problem associated with training simple ReLU neural networks of the form x ↦ Σk i=1 max{0, wi τx} with respect to the squared loss. We provide a computerassisted proof that even if the input distribution is standard Gaussian, even if the dimension is arbitrarily large, and even if the target values are generated by such a network, with orthonormal parameter vectors, the problem can still have spurious local minima once 6 ≤ k ≤ 20. By a concentration of measure argument, this implies that in high input dimensions, nearly all target networks of the relevant sizes lead to spurious local minima. Moreover, we conduct experiments which show that the probability of hitting such local minima is quite high, and increasing with the network size. On the positive side, mild over-parameterization appears to drastically reduce such local minima, indicating that an overparameterization assumption is necessary to get a positive result in this setting.
AB - We consider the optimization problem associated with training simple ReLU neural networks of the form x ↦ Σk i=1 max{0, wi τx} with respect to the squared loss. We provide a computerassisted proof that even if the input distribution is standard Gaussian, even if the dimension is arbitrarily large, and even if the target values are generated by such a network, with orthonormal parameter vectors, the problem can still have spurious local minima once 6 ≤ k ≤ 20. By a concentration of measure argument, this implies that in high input dimensions, nearly all target networks of the relevant sizes lead to spurious local minima. Moreover, we conduct experiments which show that the probability of hitting such local minima is quite high, and increasing with the network size. On the positive side, mild over-parameterization appears to drastically reduce such local minima, indicating that an overparameterization assumption is necessary to get a positive result in this setting.
UR - http://www.scopus.com/inward/record.url?scp=85057334108&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:85057334108
T3 - 35th International Conference on Machine Learning, ICML 2018
SP - 7031
EP - 7052
BT - 35th International Conference on Machine Learning, ICML 2018
A2 - Krause, Andreas
A2 - Dy, Jennifer
PB - International Machine Learning Society (IMLS)
T2 - 35th International Conference on Machine Learning, ICML 2018
Y2 - 10 July 2018 through 15 July 2018
ER -