Skip to main navigation Skip to search Skip to main content

A graph-based framework for web document mining

  • Adam Schenker
  • , Horst Bunke
  • , Mark Last
  • , Abraham Kandel

    Research output: Chapter in Book/Report/Conference proceedingChapterpeer-review

    9 Scopus citations

    Abstract

    In this paper we describe methods of performing data mining on web documents, where the web document content is represented by graphs. We show how traditional clustering and classification methods, which usually operate on vector representations of data, can be extended to work with graph-based data. Specifically, we give graphtheoretic extensions of the k-Nearest Neighbors classification algorithm and the k-means clustering algorithm that process graphs, and show how the retention of structural information can lead to improved performance over the case of the vector model approach. We introduce several different types of web document representations that utilize graphs and compare their performance for clustering and classification.

    Original languageEnglish
    Title of host publicationLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
    EditorsSimone Marinai, Andreas Dengel
    PublisherSpringer Verlag
    Pages401-412
    Number of pages12
    ISBN (Print)3540230602
    DOIs
    StatePublished - 1 Jan 2004

    Publication series

    NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
    Volume3163
    ISSN (Print)0302-9743
    ISSN (Electronic)1611-3349

    ASJC Scopus subject areas

    • Theoretical Computer Science
    • General Computer Science

    Fingerprint

    Dive into the research topics of 'A graph-based framework for web document mining'. Together they form a unique fingerprint.

    Cite this