The uncertainty principle of cross-validation

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

9 Scopus citations

Abstract

Data miners have often to deal with data sets of limited size due to economic, timing and other constraints. Usually their task is two-fold: to induce the most accurate model from a given dataset and to estimate the model's accuracy on future (unseen) examples. Cross-validation is the most common approach to estimating the true accuracy of a given model and it is based on splitting the available sample between a training set and a validation set. The practical experience shows that any cross-validation method suffers from either an optimistic or a pessimistic bias in some domains. In this paper, we present a series of large-scale experiments on artificial and real-world datasets, where we study the relationship between the model's true accuracy and its cross-validation estimator. Two stable classification algorithms (ID3 and info-fuzzy network) are used for inducing each model. The results of our experiments have a striking resemblance to the well-known Heisenberg Uncertainty Principle: the more accurate is a model induced from a small amount of real-world data, the less reliable are the values of simultaneously measured cross-validation estimates. We suggest calling this phenomenon "the uncertainty principle of cross-validation".

Original languageEnglish
Title of host publication2006 IEEE International Conference on Granular Computing
Pages275-280
Number of pages6
StatePublished - 22 Nov 2006
Event2006 IEEE International Conference on Granular Computing - Atlanta, GA, United States
Duration: 10 May 200612 May 2006

Publication series

Name2006 IEEE International Conference on Granular Computing

Conference

Conference2006 IEEE International Conference on Granular Computing
Country/TerritoryUnited States
CityAtlanta, GA
Period10/05/0612/05/06

Keywords

  • Accuracy estimation
  • Classification
  • Cross-validation
  • Info-fuzzy networks
  • Model selection

ASJC Scopus subject areas

  • Engineering (all)

Fingerprint

Dive into the research topics of 'The uncertainty principle of cross-validation'. Together they form a unique fingerprint.

Cite this