A Compact and Accurate Model for Classification

Mark Last, Oded Maimon

Research output: Contribution to journalArticlepeer-review

52 Scopus citations

Abstract

We describe and evaluate an information-theoretic algorithm for data-driven induction of classification models based on a minimal subset of available features. The relationship between input (predictive) features and the target (classification) attribute is modeled by a tree-like structure termed an information network (IN). Unlike other decision-tree models, the information network uses the same input attribute across the nodes of a given layer (level). The input attributes are selected incrementally by the algorithm to maximize a global decrease in the conditional entropy of the target attribute. We are using the prepruning approach: When no attribute causes a statistically significant decrease in the entropy, the network construction is stopped. The algorithm is shown empirically to produce much more compact models than other methods of decision-tree learning while preserving nearly the same level of classification accuracy.

Original languageEnglish
Pages (from-to)203-215
Number of pages13
JournalIEEE Transactions on Knowledge and Data Engineering
Volume16
Issue number2
DOIs
StatePublished - 1 Feb 2004

Keywords

  • Classification
  • Data mining
  • Decision trees
  • Dimensionality reduction
  • Feature selection
  • Information theoretic network
  • Information theory
  • Knowledge discovery in databases

ASJC Scopus subject areas

  • Information Systems
  • Computer Science Applications
  • Computational Theory and Mathematics

Fingerprint

Dive into the research topics of 'A Compact and Accurate Model for Classification'. Together they form a unique fingerprint.

Cite this