Comparison of distance measures for graph-based clustering of documents

Adam Schenker, Mark Last, Horst Bunke, Abraham Kandel

Research output: Chapter in Book/Report/Conference proceedingChapterpeer-review

20 Scopus citations

Abstract

In this paper we describe work relating to clustering of document collections. We compare the conventional vector-model approach using cosine similarity and Euclidean distance to a novel method we have developed for clustering graph-based data with the standard k-means algorithm. The proposed method is evaluated using five different graph distance measures under three clustering performance indices. The experiments are performed on two separate document collections. The results show the graph-based approach performs as well as vector-based methods or even better when using normalized graph distance measures.

Original languageEnglish
Title of host publicationLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
EditorsEdwin Hancock, Mario Vento
PublisherSpringer Verlag
Pages202-213
Number of pages12
ISBN (Print)354040452X, 9783540404521
DOIs
StatePublished - 1 Jan 2003

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume2726
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

ASJC Scopus subject areas

  • Theoretical Computer Science
  • General Computer Science

Fingerprint

Dive into the research topics of 'Comparison of distance measures for graph-based clustering of documents'. Together they form a unique fingerprint.

Cite this