Avoiding the look-ahead pathology of decision tree learning

Mark Last, Michael Roizman

Research output: Contribution to journalArticlepeer-review

2 Scopus citations

Abstract

Most decision-tree induction algorithms are using a local greedy strategy, where a leaf is always split on the best attribute according to a given attribute-selection criterion. A more accurate model could possibly be found by looking ahead for alternative subtrees. However, some researchers argue that the look-ahead should not be used due to a negative effect (called "decision-tree pathology") on the decision-tree accuracy. This paper presents a new look-ahead heuristics for decision-tree induction. The proposed method is called look-ahead J48 (LA-J48) as it is based on J48, the Weka implementation of the popular C4.5 algorithm. At each tree node, the LA-J48 algorithm applies the look-ahead procedure of bounded depth only to attributes that are not statistically distinguishable from the best attribute chosen by the greedy approach of C4.5. A bootstrap process is used for estimating the standard deviation of splitting criteria with unknown probability distribution. Based on a separate validation set, the attribute producing the most accurate subtree is chosen for the next step of the algorithm. In experiments on 20 benchmark data sets, the proposed look-ahead method outperforms the greedy J48 algorithm with the gain ratio and the gini index splitting criteria, thus avoiding the look-ahead pathology of decision-tree induction.

Original languageEnglish
Pages (from-to)974-987
Number of pages14
JournalInternational Journal of Intelligent Systems
Volume28
Issue number10
DOIs
StatePublished - 1 Oct 2013

ASJC Scopus subject areas

  • Software
  • Theoretical Computer Science
  • Human-Computer Interaction
  • Artificial Intelligence

Fingerprint

Dive into the research topics of 'Avoiding the look-ahead pathology of decision tree learning'. Together they form a unique fingerprint.

Cite this