Title
Text Normalization Algorithm on Twitter in Complaint Category
Abstract
Many people use microblog to express complaint or criticism. However, the limitation of the length that can be written is about 160 characters and the text is in unstructured sentence. It becomes the biggest obstacle to process the information. Those unstructured sentences cause a difficulty for preprocessing in text processing tools. Therefore, normalization is needed to make the unstructured sentences can be more understandable by a machine. We proposed a normalization of Indonesian language method which adopting some ideas of normalization from other researchers and adjust to the problem of Indonesian characteristic in unstructured sentence. The experiment exploits Twitter data which use Indonesian language in complaint category. The process is divided into three stages, which are cleaning process, OOV detection and word replacement. List of Basic words and Slang dictionary are used in the OOV detection. On the other hand, Context dictionary is built to solve the ambiguity problem. The algorithm can reaches the accuracy about 90% in a complaint category.
Year
DOI
Venue
2017
10.1016/j.procs.2017.10.004
Procedia Computer Science
Keywords
DocType
Volume
normalization,microbolog,Indonesian language,Twitter,language processing
Conference
116
Issue
ISSN
Citations 
C
1877-0509
0
PageRank 
References 
Authors
0.34
5
6
Name
Order
Citations
PageRank
Novita Hanafiah100.34
Alexander Kevin200.34
Charles Sutanto300.34
Fiona400.34
Yulyani Arifin500.34
Jaka Hartanto600.34