Title
Enhanced Genre Classification Through Linguistically Fine-Grained Pos Tags
Abstract
We propose the use of fine-grained part-of-speech (POS) tags as discriminatory attributes for automatic genre classification and report empirical results from an experiment that indicate substantial accuracy gain by such features over the conventional bag-of-words approach through word unigrams. In particular, this paper reports our research to investigate the performance of a fine-grained tag set when tested with the British component of the International Corpus of English. Ten different genre classification tasks were identified and the performance of the tags was evaluated in terms of F-score. Our results show that the use of linguistically fine-grained POS tags produces superior accuracy when compared with word unigrams, particularly for a rich set of 32 different genres with Naive Bayes Multinominal Classifier. Through a comparison with an impoverished tag set, our results further demonstrate that the superior peiformance is due to the rich linguistic information embodied in the 400-strong different POS tags.
Year
Venue
Keywords
2010
PROCEEDINGS OF THE 24TH PACIFIC ASIA CONFERENCE ON LANGUAGE, INFORMATION AND COMPUTATION
automatic genre classification, ICE-GB, fine-grained POS tag, linguistic granularity, AUTASYS
Field
DocType
Citations 
Rule-based machine translation,Naive Bayes classifier,Computer science,International Corpus of English,Embodied cognition,Speech recognition,Natural language processing,Artificial intelligence,Classifier (linguistics)
Conference
2
PageRank 
References 
Authors
0.38
12
2
Name
Order
Citations
PageRank
Alex Chengyu Fang17013.46
Jing Cao220.38