TY - JOUR
T1 - Neural text embeddings in psychological research
T2 - A guide with examples in R.
AU - Teitelbaum, Louis
AU - Simchon, Almog
N1 - Publisher Copyright:
© 2025 American Psychological Association
PY - 2025/1/1
Y1 - 2025/1/1
N2 - In this guide, we review neural embedding models and compare three methods of quantifying psychological constructs for use with embeddings: distributed dictionary representation, contextualized construct representation, and a novel approach: correlational anchored vectors. We aim to cultivate an intuition for the geometric properties of neural embeddings and a sensitivity to methodological problems that can arise in their use. We argue that while large language model embeddings have the advantage of contextualization, decontextualized word embeddings may have more ability to generalize across text genres when using cosine or dot product similarity metrics. The three methods of operationalizing psychological constructs in vector space likewise each have their advantages in particular applications. We recommend distributed dictionary representation, which derives a vector representation from a word list, for quantifying abstract constructs relating to the overall feel of a text, especially when the research requires that these constructs generalize across multiple genres of text. We recommend contextualized construct representation, which derives a representation from a questionnaire, for cases in which texts are relatively similar in content to the embedded questionnaire, such as experiments in which participants are asked to respond to a related prompt. Correlational anchored vectors, which derives a representation from labeled examples, requires suitably large and reliable training data.
AB - In this guide, we review neural embedding models and compare three methods of quantifying psychological constructs for use with embeddings: distributed dictionary representation, contextualized construct representation, and a novel approach: correlational anchored vectors. We aim to cultivate an intuition for the geometric properties of neural embeddings and a sensitivity to methodological problems that can arise in their use. We argue that while large language model embeddings have the advantage of contextualization, decontextualized word embeddings may have more ability to generalize across text genres when using cosine or dot product similarity metrics. The three methods of operationalizing psychological constructs in vector space likewise each have their advantages in particular applications. We recommend distributed dictionary representation, which derives a vector representation from a word list, for quantifying abstract constructs relating to the overall feel of a text, especially when the research requires that these constructs generalize across multiple genres of text. We recommend contextualized construct representation, which derives a representation from a questionnaire, for cases in which texts are relatively similar in content to the embedded questionnaire, such as experiments in which participants are asked to respond to a related prompt. Correlational anchored vectors, which derives a representation from labeled examples, requires suitably large and reliable training data.
KW - R
KW - large language models
KW - natural language processing
KW - text embeddings
KW - word embeddings
UR - https://www.scopus.com/pages/publications/105008523368
U2 - 10.1037/met0000768
DO - 10.1037/met0000768
M3 - Article
C2 - 40504661
AN - SCOPUS:105008523368
SN - 1082-989X
JO - Psychological Methods
JF - Psychological Methods
ER -