TY - GEN
T1 - Predicting and optimizing classifier utility with the power law
AU - Last, Mark
PY - 2007/12/1
Y1 - 2007/12/1
N2 - When data collection is costly and/or takes a significant amount of time, an early prediction of the classifier performance is extremely important for the design of the data mining process. Power law has been shown in the past to be a good predictor of decision-tree error rates as a function of the sample size. In this paper, we show that the optimal training set size for a given dataset can be computed from a learning curve characterized by a power law. Such a curve can be approximated using a small subset of potentially available data and then used to estimate the expected trade-off between the error rate and the amount of additional observations. The proposed approach to projected optimization of classifier utility is demonstrated and evaluated on several benchmark datasets.
AB - When data collection is costly and/or takes a significant amount of time, an early prediction of the classifier performance is extremely important for the design of the data mining process. Power law has been shown in the past to be a good predictor of decision-tree error rates as a function of the sample size. In this paper, we show that the optimal training set size for a given dataset can be computed from a learning curve characterized by a power law. Such a curve can be approximated using a small subset of potentially available data and then used to estimate the expected trade-off between the error rate and the amount of additional observations. The proposed approach to projected optimization of classifier utility is demonstrated and evaluated on several benchmark datasets.
UR - http://www.scopus.com/inward/record.url?scp=49549105515&partnerID=8YFLogxK
U2 - 10.1109/ICDMW.2007.31
DO - 10.1109/ICDMW.2007.31
M3 - Conference contribution
AN - SCOPUS:49549105515
SN - 0769530192
SN - 9780769530192
T3 - Proceedings - IEEE International Conference on Data Mining, ICDM
SP - 219
EP - 224
BT - ICDM Workshops 2007 - Proceedings of the 17th IEEE International Conference on Data Mining Workshops
T2 - 17th IEEE International Conference on Data Mining Workshops, ICDM Workshops 2007
Y2 - 28 October 2007 through 31 October 2007
ER -