TY - JOUR
T1 - Online reading habits can reveal personality traits
T2 - towards detecting psychological microtargeting
AU - Simchon, Almog
AU - Sutton, Adam
AU - Edwards, Matthew
AU - Lewandowsky, Stephan
N1 - Publisher Copyright:
© The Author(s) 2023. Published by Oxford University Press on behalf of National Academy of Sciences. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
PY - 2023/6/1
Y1 - 2023/6/1
N2 - Building on big data from Reddit, we generated two computational text models: (i) Predicting the personality of users from the text they have written and (ii) predicting the personality of users based on the text they have consumed. The second model is novel and without precedent in the literature. We recruited active Reddit users (N = 1, 105) of fiction-writing communities. The participants completed a Big Five personality questionnaire and consented for their Reddit activity to be scraped and used to create a machine learning model. We trained an natural language processing model [Bidirectional Encoder Representations from Transformers (BERT)], predicting personality from produced text (average performance: r = 0.33). We then applied this model to a new set of Reddit users (N = 10, 050), predicted their personality based on their produced text, and trained a second BERT model to predict their predicted-personality scores based on consumed text (average performance: r = 0.13). By doing so, we provide the first glimpse into the linguistic markers of personality-congruent consumed content.
AB - Building on big data from Reddit, we generated two computational text models: (i) Predicting the personality of users from the text they have written and (ii) predicting the personality of users based on the text they have consumed. The second model is novel and without precedent in the literature. We recruited active Reddit users (N = 1, 105) of fiction-writing communities. The participants completed a Big Five personality questionnaire and consented for their Reddit activity to be scraped and used to create a machine learning model. We trained an natural language processing model [Bidirectional Encoder Representations from Transformers (BERT)], predicting personality from produced text (average performance: r = 0.33). We then applied this model to a new set of Reddit users (N = 10, 050), predicted their personality based on their produced text, and trained a second BERT model to predict their predicted-personality scores based on consumed text (average performance: r = 0.13). By doing so, we provide the first glimpse into the linguistic markers of personality-congruent consumed content.
KW - microtargeting
KW - personality
KW - social media
KW - text modeling
UR - http://www.scopus.com/inward/record.url?scp=85177492062&partnerID=8YFLogxK
U2 - 10.1093/pnasnexus/pgad191
DO - 10.1093/pnasnexus/pgad191
M3 - Article
C2 - 37333766
AN - SCOPUS:85177492062
SN - 2752-6542
VL - 2
JO - PNAS Nexus
JF - PNAS Nexus
IS - 6
M1 - pgad191
ER -