Comparison of algorithms for web document clustering using graph representations of data

Adam Schenker, Mark Last, Horst Bunke, Abraham Kandel

Research output: Chapter in Book/Report/Conference proceedingChapterpeer-review

3 Scopus citations

Abstract

In this paper we compare the performance of several popular clustering algorithms, including k-means, fuzzy c-means, hierarchical agglomerative, and graph partitioning. The novelty of this work is that the objects to be clustered are represented by graphs rather than the usual case of numeric feature vectors. We apply these techniques to web documents, which are represented by graphs instead of vectors, in order to perform web document clustering. Web documents are structured information sources and thus appropriate for modeling by graphs. We will examine the performance of each clustering algorithm when the web documents are represented as both graphs and vectors. This will allow us to investigate the applicability of each algorithm to the problem of web document clustering.

Original languageEnglish
Title of host publicationLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
EditorsAna Fred, Terry Caelli, Robert P.W. Duin, Dick de Ridder, Aurelio Campilho
PublisherSpringer Verlag
Pages190-197
Number of pages8
ISBN (Print)9783540225706
DOIs
StatePublished - 1 Jan 2004

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume3138
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

ASJC Scopus subject areas

  • Theoretical Computer Science
  • General Computer Science

Fingerprint

Dive into the research topics of 'Comparison of algorithms for web document clustering using graph representations of data'. Together they form a unique fingerprint.

Cite this