Title
Evaluating and improving morpho-syntactic classification over multiple corpora using pre-trained, "off-the-shelf", parts-of-speech tagging tools.
Abstract
This paper evaluates six commonly available parts-of-speech tagging tools over corpora other than those upon which they were originally trained. In particular this investigation mea- sures the performance of the selected tools over varying styles and genres of text without retraining, under the assumption that domain specific training data is not always available. An inves- tigation is performed to determine whether improved results can be achieved by combining the set of tagging tools into ensem- bles that use voting schemes to determine the best tag for each word. It is found that while accuracy drops due to non-domain specific training, and tag-mapping between corpora, accuracy remains very high, with the support vector machine-based tag- ger, and the decision tree-based tagger performing best over dif- ferent corpora. It is also found that an ensemble containing a support vector machine-based tagger, a probabilistic tagger, a decision-tree based tagger and a rule-based tagger produces the largest increase in accuracy and the largest reduction in er- ror across different corpora, using the Precision-Recall voting scheme.
Year
Venue
Field
2008
South African Computer Journal
Decision tree,Trigram tagger,Voting,Computer science,Support vector machine,Morpho,Part of speech,Natural language processing,Artificial intelligence,Probabilistic logic,Syntax,Machine learning
DocType
Volume
Citations 
Journal
40
0
PageRank 
References 
Authors
0.34
5
2
Name
Order
Citations
PageRank
Kevin Glass1142.54
Shaun Bangay29717.72