A simple, structure-sensitive approach for Web document classification

    Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

    8 Scopus citations

    Abstract

    In this paper we describe a new approach to classification of web documents. Most web classification methods are based on the vector space document representation of information retrieval. Recently the graph based web document representation model was shown to outperform the traditional vector representation using k-Nearest Neighbor (k-NN) classification algorithm. Here we suggest a new hybrid approach to web document classification built upon both, graph and vector representations. K-NN algorithm and three benchmark document collections were used to compare this method to graph and vector based methods separately. Results demonstrate that we succeed in most cases to outperform graph and vector approaches in terms of classification accuracy along with a significant reduction in classification time.

    Original languageEnglish
    Title of host publicationAdvances in Web Intelligence - Third International Atlantic Web Intelligence Conference, AWIC 2005, Proceedings
    PublisherSpringer Verlag
    Pages293-298
    Number of pages6
    ISBN (Print)3540262199, 9783540262190
    DOIs
    StatePublished - 1 Jan 2005
    EventThird International Atlantic Web Intelligence Conference on Advances in Web Intelligence, AWIC 2005 - Lodz, Poland
    Duration: 6 Jun 20059 Jun 2005

    Publication series

    NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
    Volume3528 LNAI
    ISSN (Print)0302-9743
    ISSN (Electronic)1611-3349

    Conference

    ConferenceThird International Atlantic Web Intelligence Conference on Advances in Web Intelligence, AWIC 2005
    Country/TerritoryPoland
    CityLodz
    Period6/06/059/06/05

    ASJC Scopus subject areas

    • Theoretical Computer Science
    • General Computer Science

    Fingerprint

    Dive into the research topics of 'A simple, structure-sensitive approach for Web document classification'. Together they form a unique fingerprint.

    Cite this