Title
Data-Driven part-of-speech tagging of kiswahili
Abstract
In this paper we present experiments with data-driven part-of-speech taggers trained and evaluated on the annotated Helsinki Corpus of Swahili Using four of the current state-of-the-art data-driven taggers, TnT, MBT, SVMTool and MXPOST, we observe the latter as being the most accurate tagger for the Kiswahili dataset.We further improve on the performance of the individual taggers by combining them into a committee of taggers We observe that the more naive combination methods, like the novel plural voting approach, outperform more elaborate schemes like cascaded classifiers and weighted voting This paper is the first publication to present experiments on data-driven part-of-speech tagging for Kiswahili and Bantu languages in general.
Year
DOI
Venue
2006
10.1007/11846406_25
TSD
Keywords
Field
DocType
cascaded classifier,accurate tagger,present experiment,novel plural voting approach,current state-of-the-art data-driven taggers,data-driven part-of-speech,annotated helsinki corpus,individual taggers,kiswahili dataset,bantu language,part of speech
Data-driven,Plural,Bantu languages,Voting,Computer science,Swahili,Weighted voting,Part of speech,Speech recognition,Natural language,Natural language processing,Artificial intelligence
Conference
Volume
ISSN
ISBN
4188
0302-9743
3-540-39090-1
Citations 
PageRank 
References 
4
0.44
6
Authors
3
Name
Order
Citations
PageRank
Guy Pauw17512.47
Gilles-Maurice Schryver2172.17
Peter W. Wagacha3122.95