Title
Behavior extraction from tweets using character N-gram models
Abstract
Human daily activities are stored in various kinds of data representations using ICT devices nowadays, named lifelogs. It is highly requested to retrieve useful information from lifelogs because these raw data are hard to handle. Extracting human activities from these logs is promising to enrich our life. Context-awareness services can be provided depending on user activities extracted from these logs. Recently, a lot of people post a message called tweet within Twitter to show what they are doing, thinking, feeling, and so on. Tweets have potential to record human activities, because many people post tweets so frequently every day. This paper focused on the tweets to retrieve human behavior from them. The length of tweets are limited within short sentence, so this causes some difficulties. The users will use domain-specific terms and will post grammatically incorrect sentences to fit with the constraints. These make us hard to analyze tweets with grammatical manner or with dictionaries. To tackle them, we are applying character n-gram tokenization and naive Bayes classifier to extract appropriate behavioral information from tweets. Using n-gram tokenizer, domain-specific words can be identified and incorrect grammar can be handled. Our approach is examined using real tweets in Japanese. The index of precision, recall and F-measure shows the promising results. Some experiments have been carried out to show the feasibility of our approach. At this point, our system applied to Japanese tweets but it is applicable to any other languages.
Year
DOI
Venue
2014
10.1109/FUZZ-IEEE.2014.6891784
FUZZ-IEEE
Keywords
Field
DocType
data representations,recall index,character n-gram tokenization,tweet message,bayes methods,n-gram tokenizer,japanese tweets,pattern classification,information retrieval,character n-gram models,twitter,behavior extraction,precision index,naive bayes classifier,human behavior retrieval,human daily activities,behavioural sciences computing,context-awareness services,natural language processing,social networking (online),f-measure index,ict devices,dictionaries,feature extraction,grammar,data mining,training data
Tokenization (data security),Naive Bayes classifier,Computer science,Raw data,Grammar,Natural language processing,n-gram,Artificial intelligence,Information and Communications Technology,Lexical analysis,Sentence,Machine learning
Conference
ISSN
Citations 
PageRank 
1544-5615
0
0.34
References 
Authors
7
4
Name
Order
Citations
PageRank
Yuji Yano100.34
Tomonori Hashiyama29815.97
Junko Ichino33910.76
Shun'ichi Tano47521.07