Determining language variant in microblog messages - Citegraph

Paper Info

Title
Determining language variant in microblog messages

Abstract
It is difficult to determine the country of origin of the author of a short message based only on the text. This is an even more complex problem when more than one country uses the same native language. In this paper, we address the specific problem of detecting the two main variants of the Portuguese language --- European and Brazilian --- in Twitter micro-blogging data, by proposing and evaluating a set of high-precision features. We follow an automatic classification approach using a Naïve Bayes classifier, achieving 95% accuracy. We find that our system is adequate for real-time tweet classification.

Year	DOI	Venue
2013	10.1145/2480362.2480535	SAC
Keywords	Field	DocType
twitter micro-blogging data,native language,microblog message,high-precision feature,main variant,portuguese language,automatic classification approach,bayes classifier,real-time tweet classification,language variant,specific problem,complex problem	Social media,Naive Bayes classifier,Country of origin,Computer science,Portuguese,Microblogging,Natural language processing,Artificial intelligence,Search intent,Machine learning,First language	Conference
Citations	PageRank	References
6	0.75	15
Authors
5

Authors (5 rows)

Cited by (6 rows)

References (15 rows)

Name	Order	Citations	PageRank
Gustavo Laboreiro	1	58	4.51
Matko Bošnjak	2	28	1.98
Luís Sarmento	3	377	31.16
Eduarda Mendes Rodrigues	4	350	21.40
Eugénio Oliveira	5	974	111.00

1