TY - GEN
T1 - Efficient human computation
T2 - ACM SIGKDD Workshop on Human Computation, HCOMP '09
AU - Gilad-Bachrach, Ran
AU - Bar-Hillel, Aharon
AU - Ein-Dor, Liat
PY - 2009/11/23
Y1 - 2009/11/23
N2 - Collecting large labeled data sets is a laborious and expensive task, whose scaling up requires division of the labeling workload between many teachers. When the number of classes is large, miscorrespondences between the labels given by the different teachers are likely to occur, which, in the extreme case, may reach total inconsistency. In this study we describe how globally consistent labels can be obtained, despite the absence of teacher coordination, and discuss the possible efficiency of this process in terms of human labor. We define a notion of label efficiency, measuring the ratio between the number of globally consistent labels obtained and the number of labels provided by distributed teachers. We show that the efficiency depends critically on the ratio α between the number of data instances seen by a single teacher, and the number of classes. We suggest several algorithms for the distributed labeling problem, and analyze their efficiency as a function of α. In addition, we provide an upper bound on label efficiency for the case of completely uncoordinated teachers, and show that efficiency approaches 0 as the ratio between the number of labels each teacher provides and the number of classes drops (i.e. α → 0).
AB - Collecting large labeled data sets is a laborious and expensive task, whose scaling up requires division of the labeling workload between many teachers. When the number of classes is large, miscorrespondences between the labels given by the different teachers are likely to occur, which, in the extreme case, may reach total inconsistency. In this study we describe how globally consistent labels can be obtained, despite the absence of teacher coordination, and discuss the possible efficiency of this process in terms of human labor. We define a notion of label efficiency, measuring the ratio between the number of globally consistent labels obtained and the number of labels provided by distributed teachers. We show that the efficiency depends critically on the ratio α between the number of data instances seen by a single teacher, and the number of classes. We suggest several algorithms for the distributed labeling problem, and analyze their efficiency as a function of α. In addition, we provide an upper bound on label efficiency for the case of completely uncoordinated teachers, and show that efficiency approaches 0 as the ratio between the number of labels each teacher provides and the number of classes drops (i.e. α → 0).
UR - http://www.scopus.com/inward/record.url?scp=70449638288&partnerID=8YFLogxK
U2 - 10.1145/1600150.1600174
DO - 10.1145/1600150.1600174
M3 - Conference contribution
AN - SCOPUS:70449638288
SN - 9781605586724
T3 - Proceedings of the ACM SIGKDD Workshop on Human Computation, HCOMP '09
SP - 70
EP - 76
BT - Proceedings of the ACM SIGKDD Workshop on Human Computation, HCOMP '09
Y2 - 28 June 2009 through 28 June 2009
ER -