Classification of web documents using graph matching

Adam Schenker, Mark Last, Horst Bunke, Abraham Kandel

Research output: Contribution to journalArticlepeer-review

69 Scopus citations

Abstract

In this paper we describe a classification method that allows the use of graph-based representations of data instead of traditional vector-based representations. We compare the vector approach combined with the k-Nearest Neighbor (k-NN) algorithm to the graph-matching approach when classifying three different web document collections, using the leave-one-out approach for measuring classification accuracy. We also compare the performance of different graph distance measures as well as various document representations that utilize graphs. The results show the graph-based approach can outperform traditional vector-based methods in terms of accuracy, dimensionality and execution time.

Original languageEnglish
Pages (from-to)475-496
Number of pages22
JournalInternational Journal of Pattern Recognition and Artificial Intelligence
Volume18
Issue number3
DOIs
StatePublished - 1 May 2004

Keywords

  • Document classification
  • Graph matching
  • Graph representation
  • k-nearest neighbors algorithm

ASJC Scopus subject areas

  • Software
  • Computer Vision and Pattern Recognition
  • Artificial Intelligence

Cite this