Abstract
In this work we ask to which extent are simple statistics useful to make sense of social media data. By simple statistics we mean counting and bookkeeping type features such as the number of likes given to a user's post, a user's number of friends, etc. We find that relying solely on simple statistics is not always a good approach. Specifically, we develop a statistical framework that we term semantic shattering which allows to detect semantic inconsistencies in the data that may occur due to relying solely on simple statistics. We apply our framework to simple-statistics data collected from six online social media platforms and arrive at a surprising counter-intuitive finding in three of them, Twitter, Instagram and YouTube. We find that overall, the activity of the user is not correlated with the feedback that the user receives on that activity. A hint to understand this phenomenon may be found in the fact that the activity-feedback shattering did not occur in LinkedIn, Steam and Flickr. A possible explanation for this separation is the amount of effort required to produce content. The lesser the effort the lesser the correlation between activity and feedback. The amount of effort may be a proxy to the level of commitment that the users feel towards each other in the network, and indeed sociologists claim that commitment explains consistent human behavior, or lack thereof. However, the amount of effort or the level of commitment are by no means a simple statistic.
Original language | English |
---|---|
Article number | 8642436 |
Pages (from-to) | 402-408 |
Number of pages | 7 |
Journal | IEEE Transactions on Knowledge and Data Engineering |
Volume | 32 |
Issue number | 2 |
DOIs | |
State | Published - 1 Feb 2020 |
Keywords
- Online social media
- PCA
- Simpson's paradox
- data analysis
ASJC Scopus subject areas
- Information Systems
- Computer Science Applications
- Computational Theory and Mathematics