TY - JOUR
T1 - Learning and generalization with the information bottleneck
AU - Shamir, Ohad
AU - Sabato, Sivan
AU - Tishby, Naftali
N1 - Funding Information:
The authors wish to thank the anonymous reviewers for their detailed comments. This work has been partially supported by The Hebrew University Institute for Advanced Studies, and NATO SFP-982480 project.
PY - 2010/6/17
Y1 - 2010/6/17
N2 - The Information Bottleneck is an information theoretic framework that finds concise representations for an 'input' random variable that are as relevant as possible for an 'output' random variable. This framework has been used successfully in various supervised and unsupervised applications. However, its learning theoretic properties and justification remained unclear as it differs from standard learning models in several crucial aspects, primarily its explicit reliance on the joint input-output distribution. In practice, an empirical plug-in estimate of the underlying distribution has been used, so far without any finite sample performance guarantees. In this paper we present several formal results that address these difficulties. We prove several finite sample bounds, which show that the information bottleneck can provide concise representations with good generalization, based on smaller sample sizes than needed to estimate the underlying distribution. The bounds are non-uniform and adaptive to the complexity of the specific model chosen. Based on these results, we also present a preliminary analysis on the possibility of analyzing the information bottleneck method as a learning algorithm in the familiar performance-complexity tradeoff framework. In addition, we formally describe the connection between the information bottleneck and minimal sufficient statistics.
AB - The Information Bottleneck is an information theoretic framework that finds concise representations for an 'input' random variable that are as relevant as possible for an 'output' random variable. This framework has been used successfully in various supervised and unsupervised applications. However, its learning theoretic properties and justification remained unclear as it differs from standard learning models in several crucial aspects, primarily its explicit reliance on the joint input-output distribution. In practice, an empirical plug-in estimate of the underlying distribution has been used, so far without any finite sample performance guarantees. In this paper we present several formal results that address these difficulties. We prove several finite sample bounds, which show that the information bottleneck can provide concise representations with good generalization, based on smaller sample sizes than needed to estimate the underlying distribution. The bounds are non-uniform and adaptive to the complexity of the specific model chosen. Based on these results, we also present a preliminary analysis on the possibility of analyzing the information bottleneck method as a learning algorithm in the familiar performance-complexity tradeoff framework. In addition, we formally describe the connection between the information bottleneck and minimal sufficient statistics.
KW - Information bottleneck
KW - Information theory
KW - Statistical learning theory
KW - Sufficient statistics
UR - http://www.scopus.com/inward/record.url?scp=77953291435&partnerID=8YFLogxK
U2 - 10.1016/j.tcs.2010.04.006
DO - 10.1016/j.tcs.2010.04.006
M3 - Article
AN - SCOPUS:77953291435
SN - 0304-3975
VL - 411
SP - 2696
EP - 2711
JO - Theoretical Computer Science
JF - Theoretical Computer Science
IS - 29-30
ER -