TY - GEN
T1 - Enhancing cluster labeling using wikipedia
AU - Carmel, David
AU - Roitman, Haggai
AU - Zwerdling, Naama
PY - 2009/12/28
Y1 - 2009/12/28
N2 - This work investigates cluster labeling enhancement by utilizing Wikipedia, the free on-line encyclopedia. We describe a general framework for cluster labeling that extracts candidate labels from Wikipedia in addition to important terms that are extracted directly from the text. The "labeling quality" of each candidate is then evaluated by several independent judges and the top evaluated candidates are recommended for labeling. Our experimental results reveal that the Wikipedia labels agree with manual labels associated by humans to a cluster, much more than with significant terms that are extracted directly from the text. We show that in most cases even when human's associated label appears in the text, pure statistical methods have difficulty in identifying them as good descriptors. Furthermore, our experiments show that for more than 85% of the clusters in our test collection, the manual label (or an inflection, or a synonym of it) appears in the top five labels recommended by our system.
AB - This work investigates cluster labeling enhancement by utilizing Wikipedia, the free on-line encyclopedia. We describe a general framework for cluster labeling that extracts candidate labels from Wikipedia in addition to important terms that are extracted directly from the text. The "labeling quality" of each candidate is then evaluated by several independent judges and the top evaluated candidates are recommended for labeling. Our experimental results reveal that the Wikipedia labels agree with manual labels associated by humans to a cluster, much more than with significant terms that are extracted directly from the text. We show that in most cases even when human's associated label appears in the text, pure statistical methods have difficulty in identifying them as good descriptors. Furthermore, our experiments show that for more than 85% of the clusters in our test collection, the manual label (or an inflection, or a synonym of it) appears in the top five labels recommended by our system.
KW - Cluster labeling
KW - Wikipedia
UR - https://www.scopus.com/pages/publications/72449171584
U2 - 10.1145/1571941.1571967
DO - 10.1145/1571941.1571967
M3 - Conference contribution
AN - SCOPUS:72449171584
SN - 9781605584836
T3 - Proceedings - 32nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2009
SP - 139
EP - 146
BT - Proceedings - 32nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2009
T2 - 32nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2009
Y2 - 19 July 2009 through 23 July 2009
ER -