Enhancing cluster labeling using wikipedia

David Carmel, Haggai Roitman, Naama Zwerdling

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

134 Scopus citations

Abstract

This work investigates cluster labeling enhancement by utilizing Wikipedia, the free on-line encyclopedia. We describe a general framework for cluster labeling that extracts candidate labels from Wikipedia in addition to important terms that are extracted directly from the text. The "labeling quality" of each candidate is then evaluated by several independent judges and the top evaluated candidates are recommended for labeling. Our experimental results reveal that the Wikipedia labels agree with manual labels associated by humans to a cluster, much more than with significant terms that are extracted directly from the text. We show that in most cases even when human's associated label appears in the text, pure statistical methods have difficulty in identifying them as good descriptors. Furthermore, our experiments show that for more than 85% of the clusters in our test collection, the manual label (or an inflection, or a synonym of it) appears in the top five labels recommended by our system.

Original languageEnglish
Title of host publicationProceedings - 32nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2009
Pages139-146
Number of pages8
DOIs
StatePublished - 28 Dec 2009
Externally publishedYes
Event32nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2009 - Boston, MA, United States
Duration: 19 Jul 200923 Jul 2009

Publication series

NameProceedings - 32nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2009

Conference

Conference32nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2009
Country/TerritoryUnited States
CityBoston, MA
Period19/07/0923/07/09

Keywords

  • Cluster labeling
  • Wikipedia

ASJC Scopus subject areas

  • Computer Science Applications
  • Information Systems
  • Information Systems and Management

Fingerprint

Dive into the research topics of 'Enhancing cluster labeling using wikipedia'. Together they form a unique fingerprint.

Cite this