TY - GEN
T1 - Classification of web documents using concept extraction from ontologies
AU - Litvak, Marina
AU - Last, Mark
AU - Kisilevich, Slava
PY - 2007/1/1
Y1 - 2007/1/1
N2 - In this paper, we deal with the problem of analyzing and classifying web documents in a given domain by information filtering agents. We present the ontology-based web content mining methodology that contains such main stages as creation of ontology for the specified domain, collecting a training set of labeled documents, building a classification model in this domain using the constructed ontology and a classification algorithm, and classification of new documents by information agents via the induced model. We evaluated the proposed methodology in two specific domains: the chemical domain (web pages containing information about production of certain chemicals), and Yahoo! collection of web news documents divided into several categories. Our system receives as input the domain-specific ontology, and a set of categorized web documents, and then perfroms concept generalization on these documents. We use a key-phrase extractor with integrated ontology parser for creating a database from input documents and use it as a training set for the classification algorithm. The system classification accuracy is estimated using various levels of ontology.
AB - In this paper, we deal with the problem of analyzing and classifying web documents in a given domain by information filtering agents. We present the ontology-based web content mining methodology that contains such main stages as creation of ontology for the specified domain, collecting a training set of labeled documents, building a classification model in this domain using the constructed ontology and a classification algorithm, and classification of new documents by information agents via the induced model. We evaluated the proposed methodology in two specific domains: the chemical domain (web pages containing information about production of certain chemicals), and Yahoo! collection of web news documents divided into several categories. Our system receives as input the domain-specific ontology, and a set of categorized web documents, and then perfroms concept generalization on these documents. We use a key-phrase extractor with integrated ontology parser for creating a database from input documents and use it as a training set for the classification algorithm. The system classification accuracy is estimated using various levels of ontology.
UR - http://www.scopus.com/inward/record.url?scp=38049173089&partnerID=8YFLogxK
U2 - 10.1007/978-3-540-72839-9_24
DO - 10.1007/978-3-540-72839-9_24
M3 - Conference contribution
AN - SCOPUS:38049173089
SN - 9783540728382
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 287
EP - 292
BT - Autonomous Intelligent Systems
PB - Springer Verlag
T2 - 2nd International Workshop Autonomous Intelligent Systems: Agents and Data Mining, AIS-ADM 2007
Y2 - 3 June 2007 through 5 June 2007
ER -