Title
Developing a competitive HMM arabic POS tagger using small training corpora
Abstract
Part Of Speech (POS) tagging is the ability to computationally determine which POS of a word is activated by its use in a particular context. POS is one of the important processing steps for many natural language systems such as information extraction, question answering. This paper presents a study aiming to find out the appropriate strategy to develop a fast and accurate Arabic statistical POS tagger when only a limited amount of training material is available. This is an essential factor when dealing with languages like Arabic for which small annotated resources are scarce and not easily available. Different configurations of a HMM tagger are studied. Namely, bigram and trigram models are tested, as well as different smoothing techniques. In addition, new lexical model has been defined to handle unknown word POS guessing based on the linear interpolation of both word suffix probability and word prefix probability. Several experiments are carried out to determine the performance of the different configurations of HMM with two small training corpora. The first corpus includes about 29300 words from both Modern Standard Arabic and Classical Arabic. The second corpus is the Quranic Arabic Corpus which is consisting of 77,430 words of the Quranic Arabic.
Year
DOI
Venue
2011
10.1007/978-3-642-20039-7_29
ACIIDS (1)
Keywords
Field
DocType
word prefix probability,different configuration,statistical pos tagger,classical arabic,unknown word pos,quranic arabic corpus,quranic arabic,small training corpus,modern standard arabic,accurate arabic,word suffix probability,linear interpolation,information extraction,question answering,hidden markov model,part of speech,arabic languages,natural language
Question answering,Classical Arabic,Trigram,Computer science,Speech recognition,Prefix,Modern Standard Arabic,Natural language processing,Artificial intelligence,Bigram,Arabic languages,Quranic Arabic Corpus
Conference
Volume
ISSN
Citations 
6591
0302-9743
5
PageRank 
References 
Authors
0.48
15
3
Name
Order
Citations
PageRank
Mohammed Albared1383.56
Nazlia Omar27814.98
Mohd Juzaiddin Ab3719.26