Clustering of web documents using graph representations

Adam Schenker, Horst Bunke, Mark Last, Abraham Kandel

Research output: Chapter in Book/Report/Conference proceedingChapterpeer-review

8 Scopus citations

Abstract

In this paper we describe a clustering method that allows the use of graph-based representations of data instead of traditional vector-based representations. Using this new method we conduct content-based clustering of two web document collections. Clustering of web documents is performed to organize the documents with little or no human intervention. Benefits of clustering include easier browsing and improved retrieval speed. In order to measure the performance of our graph-matching approach, we compare it to the popular vector-based k-means method. We perform experiments using different graph distance measures as well as various document representations that utilize graphs. The results with the k-means clustering algorithm show that the graph-based approach can outperform traditional vector-based methods.

Original languageEnglish
Title of host publicationApplied Graph Theory in Computer Vision and Pattern Recognition
EditorsAbraham Kandel, Horst Bunke, Mark Last
Pages247-265
Number of pages19
DOIs
StatePublished - 19 Apr 2007

Publication series

NameStudies in Computational Intelligence
Volume52
ISSN (Print)1860-949X

Keywords

  • Graph distance
  • Graph representations
  • k-Means

Fingerprint

Dive into the research topics of 'Clustering of web documents using graph representations'. Together they form a unique fingerprint.

Cite this