TY - GEN

T1 - Spurious local minima are common in two-layer ReLU neural networks

AU - Safran, Itay

AU - Shamir, Ohad

N1 - Publisher Copyright:
© 35th International Conference on Machine Learning, ICML 2018.All Rights Reserved.

PY - 2018/1/1

Y1 - 2018/1/1

N2 - We consider the optimization problem associated with training simple ReLU neural networks of the form x ↦ Σk i=1 max{0, wi τx} with respect to the squared loss. We provide a computerassisted proof that even if the input distribution is standard Gaussian, even if the dimension is arbitrarily large, and even if the target values are generated by such a network, with orthonormal parameter vectors, the problem can still have spurious local minima once 6 ≤ k ≤ 20. By a concentration of measure argument, this implies that in high input dimensions, nearly all target networks of the relevant sizes lead to spurious local minima. Moreover, we conduct experiments which show that the probability of hitting such local minima is quite high, and increasing with the network size. On the positive side, mild over-parameterization appears to drastically reduce such local minima, indicating that an overparameterization assumption is necessary to get a positive result in this setting.

AB - We consider the optimization problem associated with training simple ReLU neural networks of the form x ↦ Σk i=1 max{0, wi τx} with respect to the squared loss. We provide a computerassisted proof that even if the input distribution is standard Gaussian, even if the dimension is arbitrarily large, and even if the target values are generated by such a network, with orthonormal parameter vectors, the problem can still have spurious local minima once 6 ≤ k ≤ 20. By a concentration of measure argument, this implies that in high input dimensions, nearly all target networks of the relevant sizes lead to spurious local minima. Moreover, we conduct experiments which show that the probability of hitting such local minima is quite high, and increasing with the network size. On the positive side, mild over-parameterization appears to drastically reduce such local minima, indicating that an overparameterization assumption is necessary to get a positive result in this setting.

UR - http://www.scopus.com/inward/record.url?scp=85057334108&partnerID=8YFLogxK

M3 - Conference contribution

AN - SCOPUS:85057334108

T3 - 35th International Conference on Machine Learning, ICML 2018

SP - 7031

EP - 7052

BT - 35th International Conference on Machine Learning, ICML 2018

A2 - Krause, Andreas

A2 - Dy, Jennifer

PB - International Machine Learning Society (IMLS)

T2 - 35th International Conference on Machine Learning, ICML 2018

Y2 - 10 July 2018 through 15 July 2018

ER -