Behavior extraction from tweets using character N-gram models - Citegraph

Paper Info

Title
Behavior extraction from tweets using character N-gram models

Abstract
Human daily activities are stored in various kinds of data representations using ICT devices nowadays, named lifelogs. It is highly requested to retrieve useful information from lifelogs because these raw data are hard to handle. Extracting human activities from these logs is promising to enrich our life. Context-awareness services can be provided depending on user activities extracted from these logs. Recently, a lot of people post a message called tweet within Twitter to show what they are doing, thinking, feeling, and so on. Tweets have potential to record human activities, because many people post tweets so frequently every day. This paper focused on the tweets to retrieve human behavior from them. The length of tweets are limited within short sentence, so this causes some difficulties. The users will use domain-specific terms and will post grammatically incorrect sentences to fit with the constraints. These make us hard to analyze tweets with grammatical manner or with dictionaries. To tackle them, we are applying character n-gram tokenization and naive Bayes classifier to extract appropriate behavioral information from tweets. Using n-gram tokenizer, domain-specific words can be identified and incorrect grammar can be handled. Our approach is examined using real tweets in Japanese. The index of precision, recall and F-measure shows the promising results. Some experiments have been carried out to show the feasibility of our approach. At this point, our system applied to Japanese tweets but it is applicable to any other languages.

Year	DOI	Venue
2014	10.1109/FUZZ-IEEE.2014.6891784	FUZZ-IEEE
Keywords	Field	DocType
data representations,recall index,character n-gram tokenization,tweet message,bayes methods,n-gram tokenizer,japanese tweets,pattern classification,information retrieval,character n-gram models,twitter,behavior extraction,precision index,naive bayes classifier,human behavior retrieval,human daily activities,behavioural sciences computing,context-awareness services,natural language processing,social networking (online),f-measure index,ict devices,dictionaries,feature extraction,grammar,data mining,training data	Tokenization (data security),Naive Bayes classifier,Computer science,Raw data,Grammar,Natural language processing,n-gram,Artificial intelligence,Information and Communications Technology,Lexical analysis,Sentence,Machine learning	Conference
ISSN	Citations	PageRank
1544-5615	0	0.34
References	Authors
7	4

Authors (4 rows)

Cited by (0 rows)

References (7 rows)

Name	Order	Citations	PageRank
Yuji Yano	1	0	0.34
Tomonori Hashiyama	2	98	15.97
Junko Ichino	3	39	10.76
Shun'ichi Tano	4	75	21.07

1