Title
A system for popular Thai slang extraction from social media content with n-gram based tokenization
Abstract
With increased penetration of smart devices and internet connectivity, many Thais are more readily engaged in social media, online forums, and chat groups. As there is an increased consumption of social media content, there is a shift from the consumption of traditional medias in which formal language are used regularly such as broadcast and traditional print medias. Social media posts are a reflection of the trend, where posts usually made by younger generations usually involve communication in slang and non-formal language which is not typically available in formalized dictionaries. As the Thai population like to follow trends, one of behaviors of that many Thai social media users engage in, is to follow the latest popular social media trends in slang and word usage. As slang are changed and evolved over time, it is usually useful to have an online mining tool in which could capture the trends of emerging and popular slang. This paper proposes an approach that extracts popular Thai slang by comparing social media posts and utilizing tokenization, a dictionary based approach to extract unknown words, before expanding it by using n-gram approach to figure what are currently trending and popular slang words.
Year
DOI
Venue
2016
10.1109/KST.2016.7440478
2016 8th International Conference on Knowledge and Smart Technology (KST)
Keywords
DocType
ISSN
Thai slang extraction,n-gram,tokenization,word segmentation,data-mining,social media trends
Conference
2374-314X
ISBN
Citations 
PageRank 
978-1-4673-8137-6
0
0.34
References 
Authors
6
3
Name
Order
Citations
PageRank
Rachsuda Jiamthapthaksin100.34
Pisal Setthawong200.34
Nitipan Ratanasawetwad300.34