A classifier to determine which Wikipedia biographies will be accepted

Nir Ofek, Lior Rokach

Research output: Contribution to journalArticlepeer-review

5 Scopus citations

Abstract

Wikipedia, like other encyclopedias, includes biographies of notable people. However, because it is jointly written by many contributors, it is subject to constant manipulation by contributors attempting to add biographies of non-notable people. Over time, Wikipedia has developed inclusion criteria for notable people (e.g., receiving a significant award) based on which newly contributed biographies are evaluated. In this paper we present and analyze a set of simple indicators that can be used to predict which article will eventually be accepted. These indicators do not refer to the content itself, but to meta-content features (such as the number of categories that the biography is associated with) and to author-based features (such as if it is a first-time author). By training a classifier on these features, we successfully reached a high predictive performance (area under the receiver operating characteristic [ROC] curve [AUC] of 0.97) even though we overlooked the actual biography text.

Original languageEnglish
Pages (from-to)213-218
Number of pages6
JournalJournal of the Association for Information Science and Technology
Volume66
Issue number1
DOIs
StatePublished - 1 Jan 2015

Keywords

  • information resources management
  • information retrieval

ASJC Scopus subject areas

  • Information Systems
  • Computer Networks and Communications
  • Information Systems and Management
  • Library and Information Sciences

Fingerprint

Dive into the research topics of 'A classifier to determine which Wikipedia biographies will be accepted'. Together they form a unique fingerprint.

Cite this