Abstract
We present the design and implementation of a web mining system that creates a hierarchical clustering of web documents retrieved by commercial web search engines. The cluster hierarchy is produced by a novel method called the Cluster Hierarchy Construction Algorithm (CHCA) and it can be used to explore the topics of interest related to the search query and their relationships. We discuss important design issues for our system, including stemming and dimensionality reduction, as well as some implementation details. We show examples of system results, compare them with results from similar systems, and analyze the responses to a survey of the system's users.
Original language | English |
---|---|
Pages (from-to) | 607-625 |
Number of pages | 19 |
Journal | International Journal of Intelligent Systems |
Volume | 20 |
Issue number | 6 |
DOIs | |
State | Published - 1 Jun 2005 |
ASJC Scopus subject areas
- Software
- Theoretical Computer Science
- Human-Computer Interaction
- Artificial Intelligence