TY - GEN

T1 - Multiclass Learnability and the ERM Principle

AU - Daniely, Amit

AU - Sabato, Sivan

AU - Ben-David, Shai

AU - Shalev-Shwartz, Shai

PY - 2011

Y1 - 2011

N2 - Multiclass learning is an area of growing practical relevance, for which the currently available theory is still far from providing satisfactory understanding. We study the learnability of multiclass prediction, and derive upper and lower bounds on the sample complexity of multiclass hypothesis classes in different learning models: batch/online, realizable/unrealizable, full information/bandit feedback. Our analysis reveals a surprising phenomenon: In the multiclass setting, in sharp contrast to binary classification, not all Empirical Risk Minimization (ERM) algorithms are equally successful. We show that there exist hypotheses classes for which some ERM learners have lower sample complexity than others. Furthermore, there are classes that are learnable by some ERM learners, while other ERM learner will fail to learn them. We propose a principle for designing good ERM learners, and use this principle to prove tight bounds on the sample complexity of learning symmetric multiclass hypothesis classes (that is, classes that are invariant under any permutation of label names). We demonstrate the relevance of the theory by analyzing the sample complexity of two widely used hypothesis classes: generalized linear multiclass models and reduction trees. We also obtain some practically relevant conclusions.

AB - Multiclass learning is an area of growing practical relevance, for which the currently available theory is still far from providing satisfactory understanding. We study the learnability of multiclass prediction, and derive upper and lower bounds on the sample complexity of multiclass hypothesis classes in different learning models: batch/online, realizable/unrealizable, full information/bandit feedback. Our analysis reveals a surprising phenomenon: In the multiclass setting, in sharp contrast to binary classification, not all Empirical Risk Minimization (ERM) algorithms are equally successful. We show that there exist hypotheses classes for which some ERM learners have lower sample complexity than others. Furthermore, there are classes that are learnable by some ERM learners, while other ERM learner will fail to learn them. We propose a principle for designing good ERM learners, and use this principle to prove tight bounds on the sample complexity of learning symmetric multiclass hypothesis classes (that is, classes that are invariant under any permutation of label names). We demonstrate the relevance of the theory by analyzing the sample complexity of two widely used hypothesis classes: generalized linear multiclass models and reduction trees. We also obtain some practically relevant conclusions.

KW - List of keywords

UR - http://www.scopus.com/inward/record.url?scp=84897543093&partnerID=8YFLogxK

M3 - פרסום בספר כנס

T3 - Proceedings of Machine Learning Research

SP - 207

EP - 232

BT - Proceedings of the 24nd Annual Conference on Learning Theory (COLT), JMLR Workshop and Conference Proceedings

A2 - Kakade, Sham M.

A2 - von Luxburg, Ulrike

PB - PMLR

T2 - 24th International Conference on Learning Theory, COLT 2011

Y2 - 9 July 2011 through 11 July 2011

ER -