Evaluating and improving morpho-syntactic classification over multiple corpora using pre-trained, "off-the-shelf", parts-of-speech tagging tools. - Citegraph

Paper Info

Title
Evaluating and improving morpho-syntactic classification over multiple corpora using pre-trained, "off-the-shelf", parts-of-speech tagging tools.

Abstract
This paper evaluates six commonly available parts-of-speech tagging tools over corpora other than those upon which they were originally trained. In particular this investigation mea- sures the performance of the selected tools over varying styles and genres of text without retraining, under the assumption that domain specific training data is not always available. An inves- tigation is performed to determine whether improved results can be achieved by combining the set of tagging tools into ensem- bles that use voting schemes to determine the best tag for each word. It is found that while accuracy drops due to non-domain specific training, and tag-mapping between corpora, accuracy remains very high, with the support vector machine-based tag- ger, and the decision tree-based tagger performing best over dif- ferent corpora. It is also found that an ensemble containing a support vector machine-based tagger, a probabilistic tagger, a decision-tree based tagger and a rule-based tagger produces the largest increase in accuracy and the largest reduction in er- ror across different corpora, using the Precision-Recall voting scheme.

Year	Venue	Field
2008	South African Computer Journal	Decision tree,Trigram tagger,Voting,Computer science,Support vector machine,Morpho,Part of speech,Natural language processing,Artificial intelligence,Probabilistic logic,Syntax,Machine learning
DocType	Volume	Citations
Journal	40	0
PageRank	References	Authors
0.34	5	2

Authors (2 rows)

Cited by (0 rows)

References (5 rows)

Name	Order	Citations	PageRank
Kevin Glass	1	14	2.54
Shaun Bangay	2	97	17.72

1