Abstract | ||
---|---|---|
Part Of Speech (POS) tagging is the ability to computationally determine which POS of a word is activated by its use in a particular context. POS is one of the important processing steps for many natural language systems such as information extraction, question answering. This paper presents a study aiming to find out the appropriate strategy to develop a fast and accurate Arabic statistical POS tagger when only a limited amount of training material is available. This is an essential factor when dealing with languages like Arabic for which small annotated resources are scarce and not easily available. Different configurations of a HMM tagger are studied. Namely, bigram and trigram models are tested, as well as different smoothing techniques. In addition, new lexical model has been defined to handle unknown word POS guessing based on the linear interpolation of both word suffix probability and word prefix probability. Several experiments are carried out to determine the performance of the different configurations of HMM with two small training corpora. The first corpus includes about 29300 words from both Modern Standard Arabic and Classical Arabic. The second corpus is the Quranic Arabic Corpus which is consisting of 77,430 words of the Quranic Arabic. |
Year | DOI | Venue |
---|---|---|
2011 | 10.1007/978-3-642-20039-7_29 | ACIIDS (1) |
Keywords | Field | DocType |
word prefix probability,different configuration,statistical pos tagger,classical arabic,unknown word pos,quranic arabic corpus,quranic arabic,small training corpus,modern standard arabic,accurate arabic,word suffix probability,linear interpolation,information extraction,question answering,hidden markov model,part of speech,arabic languages,natural language | Question answering,Classical Arabic,Trigram,Computer science,Speech recognition,Prefix,Modern Standard Arabic,Natural language processing,Artificial intelligence,Bigram,Arabic languages,Quranic Arabic Corpus | Conference |
Volume | ISSN | Citations |
6591 | 0302-9743 | 5 |
PageRank | References | Authors |
0.48 | 15 | 3 |
Name | Order | Citations | PageRank |
---|---|---|---|
Mohammed Albared | 1 | 38 | 3.56 |
Nazlia Omar | 2 | 78 | 14.98 |
Mohd Juzaiddin Ab | 3 | 71 | 9.26 |