TY - GEN
T1 - Hierarchical Generalization Bounds for Deep Neural Networks
AU - He, Haiyun
AU - Yu, Christina Lee
AU - Goldfeld, Ziv
N1 - Publisher Copyright:
© 2024 IEEE.
PY - 2024/1/1
Y1 - 2024/1/1
N2 - Deep neural networks (DNNs) exhibit an exceptional generalization capability in practice. This work aims to capture the effect of depth and its potential benefit for learning within the paradigm of information-theoretic generalization bounds. We derive two novel hierarchical bounds on the generalization error that explicitly depend on the internal representations within each layer. The first result, is a layer-dependent generalization bound in terms of the Kullback-Leibler (KL) divergence, which shrinks as the layer index increases. The second bound, which is based on the Wasserstein distance, implies the existence of a layer that serves as a generalization funnel, which minimizes the generalization bound. We then specialize our bounds to the case of binary Gaussian classification, and present analytic expressions dependent on weight matrices rank or certain norms, for the KL divergence and the Wasserstein bounds, respectively. Our results may provide a new perspective for understanding generalization in deep models.
AB - Deep neural networks (DNNs) exhibit an exceptional generalization capability in practice. This work aims to capture the effect of depth and its potential benefit for learning within the paradigm of information-theoretic generalization bounds. We derive two novel hierarchical bounds on the generalization error that explicitly depend on the internal representations within each layer. The first result, is a layer-dependent generalization bound in terms of the Kullback-Leibler (KL) divergence, which shrinks as the layer index increases. The second bound, which is based on the Wasserstein distance, implies the existence of a layer that serves as a generalization funnel, which minimizes the generalization bound. We then specialize our bounds to the case of binary Gaussian classification, and present analytic expressions dependent on weight matrices rank or certain norms, for the KL divergence and the Wasserstein bounds, respectively. Our results may provide a new perspective for understanding generalization in deep models.
UR - https://www.scopus.com/pages/publications/85202901908
U2 - 10.1109/ISIT57864.2024.10619279
DO - 10.1109/ISIT57864.2024.10619279
M3 - Conference contribution
AN - SCOPUS:85202901908
T3 - IEEE International Symposium on Information Theory - Proceedings
SP - 2688
EP - 2693
BT - 2024 IEEE International Symposium on Information Theory, ISIT 2024 - Proceedings
PB - Institute of Electrical and Electronics Engineers
T2 - 2024 IEEE International Symposium on Information Theory, ISIT 2024
Y2 - 7 July 2024 through 12 July 2024
ER -