Skip to main navigation Skip to search Skip to main content

Classification of web documents using concept extraction from ontologies

    Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

    5 Scopus citations

    Abstract

    In this paper, we deal with the problem of analyzing and classifying web documents in a given domain by information filtering agents. We present the ontology-based web content mining methodology that contains such main stages as creation of ontology for the specified domain, collecting a training set of labeled documents, building a classification model in this domain using the constructed ontology and a classification algorithm, and classification of new documents by information agents via the induced model. We evaluated the proposed methodology in two specific domains: the chemical domain (web pages containing information about production of certain chemicals), and Yahoo! collection of web news documents divided into several categories. Our system receives as input the domain-specific ontology, and a set of categorized web documents, and then perfroms concept generalization on these documents. We use a key-phrase extractor with integrated ontology parser for creating a database from input documents and use it as a training set for the classification algorithm. The system classification accuracy is estimated using various levels of ontology.

    Original languageEnglish
    Title of host publicationAutonomous Intelligent Systems
    Subtitle of host publicationAgents and Data Mining - Second International Workshop, AIS-ADM 2007, Proceedings
    PublisherSpringer Verlag
    Pages287-292
    Number of pages6
    ISBN (Print)9783540728382
    DOIs
    StatePublished - 1 Jan 2007
    Event2nd International Workshop Autonomous Intelligent Systems: Agents and Data Mining, AIS-ADM 2007 - St. Petersburg, Russian Federation
    Duration: 3 Jun 20075 Jun 2007

    Publication series

    NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
    Volume4476 LNAI
    ISSN (Print)0302-9743
    ISSN (Electronic)1611-3349

    Conference

    Conference2nd International Workshop Autonomous Intelligent Systems: Agents and Data Mining, AIS-ADM 2007
    Country/TerritoryRussian Federation
    CitySt. Petersburg
    Period3/06/075/06/07

    ASJC Scopus subject areas

    • Theoretical Computer Science
    • General Computer Science

    Fingerprint

    Dive into the research topics of 'Classification of web documents using concept extraction from ontologies'. Together they form a unique fingerprint.

    Cite this