TY - JOUR
T1 - Discourse Parsing of Contentious, Non-Convergent Online Discussions
AU - Zakharov, Stepan
AU - Hadar, Omri
AU - Hakak, Tovit
AU - Grossman, Dina
AU - Kolikant, Yifat Ben-David
AU - Tsur, Oren
PY - 2021/5/22
Y1 - 2021/5/22
N2 - Online discourse is often perceived as polarized and unproductive. While some conversational discourse parsing frameworks are available, they do not naturally lend themselves to the analysis of contentious and polarizing discussions. Inspired by the Bakhtinian theory of Dialogism, we propose a novel theoretical and computational framework, better suited for non-convergent discussions. We redefine the measure of a successful discussion, and develop a novel discourse annotation schema which reflects a hierarchy of discursive strategies. We consider an array of classification models -- from Logistic Regression to BERT. We also consider various feature types and representations, e.g., LIWC categories, standard embeddings, conversational sequences, and non-conversational discourse markers learnt separately. Given the 31 labels in the tagset, an average F-Score of 0.61 is achieved if we allow a different model for each tag, and 0.526 with a single model. The promising results achieved in annotating discussions according to the proposed schema paves the way for a number of downstream tasks and applications such as early detection of discussion trajectories, active moderation of open discussions, and teacher-assistive bots. Finally, we share the first labeled dataset of contentious non-convergent online discussions.
AB - Online discourse is often perceived as polarized and unproductive. While some conversational discourse parsing frameworks are available, they do not naturally lend themselves to the analysis of contentious and polarizing discussions. Inspired by the Bakhtinian theory of Dialogism, we propose a novel theoretical and computational framework, better suited for non-convergent discussions. We redefine the measure of a successful discussion, and develop a novel discourse annotation schema which reflects a hierarchy of discursive strategies. We consider an array of classification models -- from Logistic Regression to BERT. We also consider various feature types and representations, e.g., LIWC categories, standard embeddings, conversational sequences, and non-conversational discourse markers learnt separately. Given the 31 labels in the tagset, an average F-Score of 0.61 is achieved if we allow a different model for each tag, and 0.526 with a single model. The promising results achieved in annotating discussions according to the proposed schema paves the way for a number of downstream tasks and applications such as early detection of discussion trajectories, active moderation of open discussions, and teacher-assistive bots. Finally, we share the first labeled dataset of contentious non-convergent online discussions.
KW - cs.CL
KW - cs.SI
U2 - 10.1609/icwsm.v15i1.18109
DO - 10.1609/icwsm.v15i1.18109
M3 - Conference article
SN - 2162-3449
VL - 15
SP - 853
EP - 864
JO - Proceedings of the International AAAI Conference on Web and Social Media
JF - Proceedings of the International AAAI Conference on Web and Social Media
IS - 1
ER -