Title
Simple Statistics Are Sometime Too Simple: A Case Study in Social Media Data
Abstract
In this work we ask to which extent are simple statistics useful to make sense of social media data. By simple statistics we mean counting and bookkeeping type features such as the number of likes given to a user's post, a user's number of friends, etc. We find that relying solely on simple statistics is not always a good approach. Specifically, we develop a statistical framework that we term <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">semantic shattering</italic> which allows to detect semantic inconsistencies in the data that may occur due to relying solely on simple statistics. We apply our framework to simple-statistics data collected from six online social media platforms and arrive at a surprising counter-intuitive finding in three of them, Twitter, Instagram and YouTube. We find that overall, the activity of the user is not correlated with the feedback that the user receives on that activity. A hint to understand this phenomenon may be found in the fact that the activity-feedback shattering did not occur in LinkedIn, Steam and Flickr. A possible explanation for this separation is the amount of effort required to produce content. The lesser the effort the lesser the correlation between activity and feedback. The amount of effort may be a proxy to the level of commitment that the users feel towards each other in the network, and indeed sociologists claim that commitment explains consistent human behavior, or lack thereof. However, the amount of effort or the level of commitment are by no means a simple statistic.
Year
DOI
Venue
2020
10.1109/TKDE.2019.2899355
IEEE Transactions on Knowledge and Data Engineering
Keywords
Field
DocType
Semantics,Twitter,Principal component analysis,YouTube,LinkedIn,Correlation
Proxy (climate),Social media,Ask price,Computer science,Bookkeeping,Phenomenon,Statistics
Journal
Volume
Issue
ISSN
32
2
1041-4347
Citations 
PageRank 
References 
0
0.34
0
Authors
1
Name
Order
Citations
PageRank
Dan Vilenchik114313.36