To interpret or not to interpret PCA? This is our question

Dan Vilenchik, Barak Yichye, Maor Abutbul

    Research output: Contribution to conferencePaperpeer-review

    6 Scopus citations

    Abstract

    Principal Component Analysis (PCA) is a central tool for analyzing data and social media data in particular. Typically, the data is projected on the first two PCs to obtain a two-dimensional view, and trends and patterns are being examined. A key to making sense of the projected data is the semantic interpretation of the new axes (the PCs). To label the PCs, one usually looks at the top k vector entries in absolute value and assigns meaning according to them. The choice of k is done by “eyeballing” the vector. In this work we provide a computational framework to support this process and suggest an interpretability score, which measures how sensitive the interpretation step could be to the choice of k. Furthermore we give a visual method to choose the optimal k. We study our methodology in four social media platforms and discover that in two of them, Twitter and Instagram, interpretation can be done in a carefree manner, but in Steam and LinkedIn there is no natural labeling of the axes. This separation is clearly reflected in the interpretability score that each dataset received.

    Original languageEnglish
    Pages655-658
    Number of pages4
    StatePublished - 1 Jan 2019
    Event13th International Conference on Web and Social Media, ICWSM 2019 - Munich, Germany
    Duration: 11 Jun 201914 Jun 2019

    Conference

    Conference13th International Conference on Web and Social Media, ICWSM 2019
    Country/TerritoryGermany
    CityMunich
    Period11/06/1914/06/19

    ASJC Scopus subject areas

    • Computer Networks and Communications

    Fingerprint

    Dive into the research topics of 'To interpret or not to interpret PCA? This is our question'. Together they form a unique fingerprint.

    Cite this