TY - GEN
T1 - Estimating information flow in deep neural networks
AU - Goldfeld, Ziv
AU - Van Den Berg, Ewout
AU - Greenewald, Kristjan
AU - Melnyk, Igor
AU - Nguyen, Nam
AU - Kingsbury, Brian
AU - Polyanskiy, Yury
N1 - Publisher Copyright:
Copyright © 2019 ASME
PY - 2019/1/1
Y1 - 2019/1/1
N2 - We study the estimation of the mutual information I(X;Tℓ) between the input X to a deep neural network (DNN) and the output vector Tℓ of its ℓth hidden layer (an "internal representation"). Focusing on feedforward networks with fixed weights and noisy internal representations, we develop a rigorous framework for accurate estimation of I(X; Tℓ). By relating I(X; Tℓ) to information transmission over additive white Gaussian noise channels, we reveal that compression, i.e. reduction in I(X;Tℓ) over the course of training, is driven by progressive geometric clustering of the representations of samples from the same class. Experimental results verify this connection. Finally, we shift focus to purely deterministic DNNs, where I(X; Tℓ) is provably vacuous, and show that nevertheless, these models also cluster inputs belonging to the same class. The binning-based approximation of I(X; Tℓ) employed in past works to measure compression is identified as a measure of clustering, thus clarifying that these experiments were in fact tracking the same clustering phenomenon. Leveraging the clustering perspective, we provide new evidence that compression and generalization may not be causally related and discuss potential future research ideas.
AB - We study the estimation of the mutual information I(X;Tℓ) between the input X to a deep neural network (DNN) and the output vector Tℓ of its ℓth hidden layer (an "internal representation"). Focusing on feedforward networks with fixed weights and noisy internal representations, we develop a rigorous framework for accurate estimation of I(X; Tℓ). By relating I(X; Tℓ) to information transmission over additive white Gaussian noise channels, we reveal that compression, i.e. reduction in I(X;Tℓ) over the course of training, is driven by progressive geometric clustering of the representations of samples from the same class. Experimental results verify this connection. Finally, we shift focus to purely deterministic DNNs, where I(X; Tℓ) is provably vacuous, and show that nevertheless, these models also cluster inputs belonging to the same class. The binning-based approximation of I(X; Tℓ) employed in past works to measure compression is identified as a measure of clustering, thus clarifying that these experiments were in fact tracking the same clustering phenomenon. Leveraging the clustering perspective, we provide new evidence that compression and generalization may not be causally related and discuss potential future research ideas.
UR - https://www.scopus.com/pages/publications/85078281542
M3 - Conference contribution
AN - SCOPUS:85078281542
T3 - 36th International Conference on Machine Learning, ICML 2019
SP - 4153
EP - 4162
BT - 36th International Conference on Machine Learning, ICML 2019
PB - International Machine Learning Society (IMLS)
T2 - 36th International Conference on Machine Learning, ICML 2019
Y2 - 9 June 2019 through 15 June 2019
ER -