TY - GEN
T1 - Exploiting Wikipedia for information retrieval tasks
AU - Shapira, Bracha
AU - Ofek, Nir
AU - Makarenkov, Victor
N1 - Publisher Copyright:
© 2015 ACM.
PY - 2015/8/9
Y1 - 2015/8/9
N2 - Wikipedia - the online encyclopedia - has long been used as a source of information for researchers, as well as being a subject of research itself [11, 12, 23, 5, 6]. Wikipedia has been shown to be effective in recommender systems, sentiment analysis, validation and multiple domains in information retrieval. One of the reasons for Wikipedia's popularity among researchers and practitioners is the multiple types of information it contains, which enables practitioners to select the right "tool" for their respective tasks. In addition to its great potential, this multitude of information sources also poses a challenge: which sources of information are best suited for a specific problem and how can different types of data be combined? This tutorial aims to provide a holistic view of Wikipedia's different features - text, links, categories, page views, editing history etc. - and explore the different ways they can be utilized in a machine learning framework. By presenting and contrasting the latest works that utilize Wikipedia in multiple domains, this tutorial aims to increase the awareness among researchers and practitioners in these fields to the benefits of utilizing Wikipedia in their respective domains, in particular to the use of multiple sources of information simultaneously.
AB - Wikipedia - the online encyclopedia - has long been used as a source of information for researchers, as well as being a subject of research itself [11, 12, 23, 5, 6]. Wikipedia has been shown to be effective in recommender systems, sentiment analysis, validation and multiple domains in information retrieval. One of the reasons for Wikipedia's popularity among researchers and practitioners is the multiple types of information it contains, which enables practitioners to select the right "tool" for their respective tasks. In addition to its great potential, this multitude of information sources also poses a challenge: which sources of information are best suited for a specific problem and how can different types of data be combined? This tutorial aims to provide a holistic view of Wikipedia's different features - text, links, categories, page views, editing history etc. - and explore the different ways they can be utilized in a machine learning framework. By presenting and contrasting the latest works that utilize Wikipedia in multiple domains, this tutorial aims to increase the awareness among researchers and practitioners in these fields to the benefits of utilizing Wikipedia in their respective domains, in particular to the use of multiple sources of information simultaneously.
KW - Information retrieval
KW - Machine learning
KW - Wikipedia
UR - http://www.scopus.com/inward/record.url?scp=84953790604&partnerID=8YFLogxK
U2 - 10.1145/2766462.2767879
DO - 10.1145/2766462.2767879
M3 - Conference contribution
AN - SCOPUS:84953790604
T3 - SIGIR 2015 - Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval
SP - 1137
EP - 1140
BT - SIGIR 2015 - Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval
PB - Association for Computing Machinery, Inc
T2 - 38th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2015
Y2 - 9 August 2015 through 13 August 2015
ER -