TY - GEN
T1 - Leveraging Exposure Networks for Detecting Fake News Sources
AU - Reuben, Maor
AU - Friedland, Lisa
AU - Puzis, Rami
AU - Grinberg, Nir
N1 - Publisher Copyright:
© 2024 Copyright held by the owner/author(s).
PY - 2024/8/25
Y1 - 2024/8/25
N2 - The scale and dynamic nature of the Web makes real-time detection of misinformation an extremely difficult task. Prior research mostly focused on offline (retrospective) detection of stories or claims using linguistic features of the content, flagging by users, and crowdsourced labels. Here, we develop a novel machine-learning methodology for detecting fake news sources using active learning, and examine the contribution of network, audience, and text features to the model accuracy. Importantly, we evaluate performance in both offline and online settings, mimicking the strategic choices fact-checkers have to make in practice as news sources emerge over time. We find that exposure networks provide information on considerably more sources than sharing networks (+49.6%), and that the inclusion of exposure features greatly improves classification PR-AUC in both offline (+33%) and online (+69.2%) settings. Textual features perform best in offline settings, but their performance deteriorates by 12.0-18.7% in online settings. Finally, the results show that a few iterations of active learning are sufficient for our model to attain predictive performance to comparable exhaustive labeling while incurring only 24.7% of the labeling costs. These results stress the importance of exposure networks as a source of valuable information for the investigation of information dissemination in social networks and question the robustness of textual features.
AB - The scale and dynamic nature of the Web makes real-time detection of misinformation an extremely difficult task. Prior research mostly focused on offline (retrospective) detection of stories or claims using linguistic features of the content, flagging by users, and crowdsourced labels. Here, we develop a novel machine-learning methodology for detecting fake news sources using active learning, and examine the contribution of network, audience, and text features to the model accuracy. Importantly, we evaluate performance in both offline and online settings, mimicking the strategic choices fact-checkers have to make in practice as news sources emerge over time. We find that exposure networks provide information on considerably more sources than sharing networks (+49.6%), and that the inclusion of exposure features greatly improves classification PR-AUC in both offline (+33%) and online (+69.2%) settings. Textual features perform best in offline settings, but their performance deteriorates by 12.0-18.7% in online settings. Finally, the results show that a few iterations of active learning are sufficient for our model to attain predictive performance to comparable exhaustive labeling while incurring only 24.7% of the labeling costs. These results stress the importance of exposure networks as a source of valuable information for the investigation of information dissemination in social networks and question the robustness of textual features.
KW - fact-checking
KW - fake news detection
KW - misinformation
KW - network science
KW - social media
UR - http://www.scopus.com/inward/record.url?scp=85203690338&partnerID=8YFLogxK
U2 - 10.1145/3637528.3671539
DO - 10.1145/3637528.3671539
M3 - Conference contribution
AN - SCOPUS:85203690338
T3 - Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
SP - 5635
EP - 5646
BT - KDD 2024 - Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining
PB - Association for Computing Machinery
T2 - 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, KDD 2024
Y2 - 25 August 2024 through 29 August 2024
ER -