TY - GEN
T1 - A stochastic gradient descent algorithm for structural risk minimisation
AU - Ratsaby, Joel
N1 - Publisher Copyright:
© Springer-Verlag Berlin Heidelberg 2003.
PY - 2003/1/1
Y1 - 2003/1/1
N2 - Structural risk minimisation (SRM) is a general complexity regularization method which automatically selects the model complexity that approximately minimises the misclassification error probability of the empirical risk minimiser. It does so by adding a complexity penalty term ∊(m, k) to the empirical risk of the candidate hypotheses and then for any fixed sample size m it minimises the sum with respect to the model complexity variable k. When learning multicategory classification there are M subsamples mi, corresponding to the M pattern classes with a priori probabilities pi, 1 ≤ i ≤ M. Using the usual representation for a multi-category classifier as M individual boolean classifiers, the penalty becomes ∑Mi=1 pi∊(mi, ki). If the mi are given then the standard SRM trivially applies here by minimizing the penalised empirical risk with respect to ki, 1,..., M. However, in situations where the total sample size ∑Mi=1 mi needs to be minimal one needs to also minimize the penalised empirical risk with respect to the variables mi, i = 1,..., M. The obvious problem is that the empirical risk can only be defined after the subsamples (and hence their sizes) are given (known). Utilising an on-line stochastic gradient descent approach, this paper overcomes this difficulty and introduces a sample-querying algorithm which extends the standard SRM principle. It minimises the penalised empirical risk not only with respect to the ki, as the standard SRM does, but also with respect to the mi, i = 1,..., M. The challenge here is in defining a stochastic empirical criterion which when minimised yields a sequence of subsample-size vectors which asymptotically achieve the Bayes’ optimal error convergence rate.
AB - Structural risk minimisation (SRM) is a general complexity regularization method which automatically selects the model complexity that approximately minimises the misclassification error probability of the empirical risk minimiser. It does so by adding a complexity penalty term ∊(m, k) to the empirical risk of the candidate hypotheses and then for any fixed sample size m it minimises the sum with respect to the model complexity variable k. When learning multicategory classification there are M subsamples mi, corresponding to the M pattern classes with a priori probabilities pi, 1 ≤ i ≤ M. Using the usual representation for a multi-category classifier as M individual boolean classifiers, the penalty becomes ∑Mi=1 pi∊(mi, ki). If the mi are given then the standard SRM trivially applies here by minimizing the penalised empirical risk with respect to ki, 1,..., M. However, in situations where the total sample size ∑Mi=1 mi needs to be minimal one needs to also minimize the penalised empirical risk with respect to the variables mi, i = 1,..., M. The obvious problem is that the empirical risk can only be defined after the subsamples (and hence their sizes) are given (known). Utilising an on-line stochastic gradient descent approach, this paper overcomes this difficulty and introduces a sample-querying algorithm which extends the standard SRM principle. It minimises the penalised empirical risk not only with respect to the ki, as the standard SRM does, but also with respect to the mi, i = 1,..., M. The challenge here is in defining a stochastic empirical criterion which when minimised yields a sequence of subsample-size vectors which asymptotically achieve the Bayes’ optimal error convergence rate.
UR - http://www.scopus.com/inward/record.url?scp=0242372956&partnerID=8YFLogxK
U2 - 10.1007/978-3-540-39624-6_17
DO - 10.1007/978-3-540-39624-6_17
M3 - Conference contribution
AN - SCOPUS:0242372956
SN - 3540202919
SN - 9783540202912
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 205
EP - 220
BT - Algorithmic Learning Theory - 14th International Conference, ALT 2003, Proceedings
A2 - Gavalda, Ricard
A2 - Jantke, Klaus P.
A2 - Takimoto, Eiji
PB - Springer Verlag
T2 - 14th International Conference on Algorithmic Learning Theory, ALT 2003
Y2 - 17 October 2003 through 19 October 2003
ER -