Filtering search results using an optimal set of terms identified by an artificial neural network

Tsvi Kuflik, Zvi Boger, Peretz Shoval

Research output: Contribution to journalArticlepeer-review

13 Scopus citations

Abstract

Information filtering (IF) systems usually filter data items by correlating a set of terms representing the user's interest (a user profile) with similar sets of terms representing the data items. Many techniques can be employed for constructing user profiles automatically, but they usually yield large sets of term. Various dimensionality-reduction techniques can be applied in order to reduce the number of terms in a user profile. We describe a new terms selection technique including a dimensionality-reduction mechanism which is based on the analysis of a trained artificial neural network (ANN) model. Its novel feature is the identification of an optimal set of terms that can classify correctly data items that are relevant to a user. The proposed technique was compared with the classical Rocchio algorithm. We found that when using all the distinct terms in the training set to train an ANN, the Rocchio algorithm outperforms the ANN based filtering system, but after applying the new dimensionality-reduction technique, leaving only an optimal set of terms, the improved ANN technique outperformed both the original ANN and the Rocchio algorithm.

Original languageEnglish
Pages (from-to)469-483
Number of pages15
JournalInformation Processing and Management
Volume42
Issue number2
DOIs
StatePublished - 1 Mar 2006

Keywords

  • Artificial neural network
  • Feature selection
  • Information filtering
  • User profile

ASJC Scopus subject areas

  • Information Systems
  • Media Technology
  • Computer Science Applications
  • Management Science and Operations Research
  • Library and Information Sciences

Fingerprint

Dive into the research topics of 'Filtering search results using an optimal set of terms identified by an artificial neural network'. Together they form a unique fingerprint.

Cite this