Title | ||
---|---|---|
Evaluating and improving morpho-syntactic classification over multiple corpora using pre-trained, "off-the-shelf", parts-of-speech tagging tools. |
Abstract | ||
---|---|---|
This paper evaluates six commonly available parts-of-speech tagging tools over corpora other than those upon which they were originally trained. In particular this investigation mea- sures the performance of the selected tools over varying styles and genres of text without retraining, under the assumption that domain specific training data is not always available. An inves- tigation is performed to determine whether improved results can be achieved by combining the set of tagging tools into ensem- bles that use voting schemes to determine the best tag for each word. It is found that while accuracy drops due to non-domain specific training, and tag-mapping between corpora, accuracy remains very high, with the support vector machine-based tag- ger, and the decision tree-based tagger performing best over dif- ferent corpora. It is also found that an ensemble containing a support vector machine-based tagger, a probabilistic tagger, a decision-tree based tagger and a rule-based tagger produces the largest increase in accuracy and the largest reduction in er- ror across different corpora, using the Precision-Recall voting scheme. |
Year | Venue | Field |
---|---|---|
2008 | South African Computer Journal | Decision tree,Trigram tagger,Voting,Computer science,Support vector machine,Morpho,Part of speech,Natural language processing,Artificial intelligence,Probabilistic logic,Syntax,Machine learning |
DocType | Volume | Citations |
Journal | 40 | 0 |
PageRank | References | Authors |
0.34 | 5 | 2 |
Name | Order | Citations | PageRank |
---|---|---|---|
Kevin Glass | 1 | 14 | 2.54 |
Shaun Bangay | 2 | 97 | 17.72 |