Title
Predicting Author Age from Weibo Microblog Posts.
Abstract
We report an author profiling study based on Chinese social media texts gleaned from Sina Weibo (sic) in which we attempt to predict the author's age group based on various linguistic text features mainly relating to non-standard orthography: classical Chinese characters, hashtags, emoticons and kaomoji, homogeneous punctuation and Latin character sequences, and poetic format. We also tracked the use of selected popular Chinese expressions, parts-of-speech and word types. We extracted 100 posts from 100 users in each of four age groups (under-18, 19-29, 30-39, over-40 years) and by clustering users' posts fifty at a time we trained a maximum entropy classifier to predict author age group to an accuracy of 65.5%. We show which features are associated with younger and older age groups, and make our normalisation resources available to other researchers.
Year
Venue
Keywords
2016
LREC 2016 - TENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION
Weibo,microblog linguistics,text forensics,computational sociolinguistics
Field
DocType
Citations 
Social media,Computer science,Microblogging,Natural language processing,Artificial intelligence
Conference
1
PageRank 
References 
Authors
0.34
0
4
Name
Order
Citations
PageRank
Wanru Zhang110.34
Andrew Caines246.13
Dimitrios Alikaniotis332.47
Paula Buttery4379.83