Graph representations for web document clustering

Adam Schenker, Mark Last, Horst Bunke, Abraham Kandel

Research output: Chapter in Book/Report/Conference proceedingChapterpeer-review

18 Scopus citations

Abstract

In this paper we describe clustering of web documents represented by graphs rather than vectors. We present a novel method for clustering graph-based data using the standard k-means algorithm and compare its performance to the conventional vector-model approach using cosine similarity. The proposed method is evaluated when using five different graph representations under two different clustering performance indices. The experiments are performed on two separate web document collections.

Original languageEnglish
Title of host publicationLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
EditorsFrancisco Jose Perales, Aurelio J. C. Campilho, Nicolas Perez Perez, Nicolas Perez Perez
PublisherSpringer Verlag
Pages935-942
Number of pages8
ISBN (Print)3540402179, 9783540402176
DOIs
StatePublished - 1 Jan 2003

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume2652
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Fingerprint

Dive into the research topics of 'Graph representations for web document clustering'. Together they form a unique fingerprint.

Cite this