Automatically annotating a five-billion-word corpus of Japanese blogs for affect and sentiment analysis - Citegraph

Paper Info

Title
Automatically annotating a five-billion-word corpus of Japanese blogs for affect and sentiment analysis

Abstract
This paper presents our research on automatic annotation of a five-billion-word corpus of Japanese blogs with information on affect and sentiment. We first perform a study in emotion blog corpora to discover that there has been no large scale emotion corpus available for the Japanese language. We choose the largest blog corpus for the language and annotate it with the use of two systems for affect analysis: ML-Ask for word- and sentence-level affect analysis and CAO for detailed analysis of emoticons. The annotated information includes affective features like sentence subjectivity (emotive/non-emotive) or emotion classes (joy, sadness, etc.), useful in affect analysis. The annotations are also generalized on a 2-dimensional model of affect to obtain information on sentence valence/polarity (positive/negative) useful in sentiment analysis. The annotations are evaluated in several ways. Firstly, on a test set of a thousand sentences extracted randomly and evaluated by over forty respondents. Secondly, the statistics of annotations are compared to other existing emotion blog corpora. Finally, the corpus is applied in several tasks, such as generation of emotion object ontology or retrieval of emotional and moral consequences of actions.

Year	Venue	Keywords
2012	WASSA@ACL	detailed analysis,japanese blogs,emotion object ontology,existing emotion blog corpus,emotion class,emotion blog corpus,largest blog corpus,large scale emotion corpus,affect analysis,five-billion-word corpus,sentiment analysis
Field	DocType	Volume
Sadness,Ontology,Annotation,Sentiment analysis,Computer science,Natural language processing,Artificial intelligence,Emotive,Affect (psychology),Sentence,Test set	Conference	W12-37
Citations	PageRank	References
2	0.36	13
Authors
4

Authors (4 rows)

Cited by (2 rows)

References (13 rows)

Name	Order	Citations	PageRank
Michal Ptaszynski	1	132	25.47
Rafal Rzepka	2	187	40.62
Kenji Araki	3	343	80.17
Yoshio Momouchi	4	54	15.10

1