Baselines and bigrams: simple, good sentiment and topic classification - Citegraph

Paper Info

Title
Baselines and bigrams: simple, good sentiment and topic classification

Abstract
Variants of Naive Bayes (NB) and Support Vector Machines (SVM) are often used as baseline methods for text classification, but their performance varies greatly depending on the model variant, features used and task/dataset. We show that: (i) the inclusion of word bigram features gives consistent gains on sentiment analysis tasks; (ii) for short snippet sentiment tasks, NB actually does better than SVMs (while for longer documents the opposite result holds); (iii) a simple but novel SVM variant using NB log-count ratios as feature values consistently performs well across tasks and datasets. Based on these observations, we identify simple NB and SVM variants which outperform most published results on sentiment analysis datasets, sometimes providing a new state-of-the-art performance level.

Year	Venue	Keywords
2012	ACL	nb log-count ratio,new state-of-the-art performance level,topic classification,sentiment analysis task,good sentiment,svm variant,simple nb,sentiment analysis datasets,naive bayes,short snippet sentiment task,novel svm variant,model variant
Field	DocType	Volume
Naive Bayes classifier,Sentiment analysis,Computer science,Support vector machine,Baseline (configuration management),Artificial intelligence,Natural language processing,Bigram,Snippet,Machine learning	Conference	P12-2
Citations	PageRank	References
196	9.06	18
Authors
2

Search Limit

100196

Authors (2 rows)

Cited by (100 rows)

References (18 rows)

Name	Order	Citations	PageRank
Sida Wang	1	541	44.65
Christopher D. Manning	2	22579	1126.22

1