TY - GEN
T1 - Constance
T2 - 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017
AU - Joseph, Kenneth
AU - Friedland, Lisa
AU - Hobbs, Will
AU - Lazer, David
AU - Tsur, Oren
N1 - Publisher Copyright:
© 2017 Association for Computational Linguistics.
PY - 2017/1/1
Y1 - 2017/1/1
N2 - Manual annotations are a prerequisite for many applications of machine learning. However, weaknesses in the annotation process itself are easy to overlook. In particular, scholars often choose what information to give to annotators without examining these decisions empirically. For subjective tasks such as sentiment analysis, sarcasm, and stance detection, such choices can impact results. Here, for the task of political stance detection on Twitter, we show that providing too little context can result in noisy and uncertain annotations, whereas providing too strong a context may cause it to outweigh other signals. To characterize and reduce these biases, we develop ConStance, a general model for reasoning about annotations across information conditions. Given conflicting labels produced by multiple annotators seeing the same instances with different contexts, ConStance simultaneously estimates gold standard labels and also learns a classifier for new instances. We show that the classifier learned by ConStance outperforms a variety of baselines at predicting political stance, while the model’s interpretable parameters shed light on the effects of each context.
AB - Manual annotations are a prerequisite for many applications of machine learning. However, weaknesses in the annotation process itself are easy to overlook. In particular, scholars often choose what information to give to annotators without examining these decisions empirically. For subjective tasks such as sentiment analysis, sarcasm, and stance detection, such choices can impact results. Here, for the task of political stance detection on Twitter, we show that providing too little context can result in noisy and uncertain annotations, whereas providing too strong a context may cause it to outweigh other signals. To characterize and reduce these biases, we develop ConStance, a general model for reasoning about annotations across information conditions. Given conflicting labels produced by multiple annotators seeing the same instances with different contexts, ConStance simultaneously estimates gold standard labels and also learns a classifier for new instances. We show that the classifier learned by ConStance outperforms a variety of baselines at predicting political stance, while the model’s interpretable parameters shed light on the effects of each context.
UR - http://www.scopus.com/inward/record.url?scp=85073152186&partnerID=8YFLogxK
U2 - 10.18653/v1/d17-1116
DO - 10.18653/v1/d17-1116
M3 - Conference contribution
AN - SCOPUS:85073152186
T3 - EMNLP 2017 - Conference on Empirical Methods in Natural Language Processing, Proceedings
SP - 1115
EP - 1124
BT - EMNLP 2017 - Conference on Empirical Methods in Natural Language Processing, Proceedings
PB - Association for Computational Linguistics (ACL)
Y2 - 9 September 2017 through 11 September 2017
ER -