Title
Improving The Accuracy Of English-Arabic Statistical Sentence Alignment
Abstract
Multilingual natural language processing systems are increasingly relying on parallel corpus to ameliorate their output. Parallel cot-pot-a constitute the basic block for training a statistical natural language processing system and creating translation and language models. Several systems have been devised that automatically align words of a pair of sentences, each in a language. Such systems have been used successfully with European languages. In this paper, one such system is used to align sentences in an English-Arabic corpus. The system works poorly given raw unaligned sentence English-Arabic sentence pairs. This prompted the development of a preprocessing step to be applied to the Arabic sentences. The same corpus was then preprocessed and a significant improvement is reported when alignment is attempted using the preprocessed unaligned sentences.
Year
Venue
Keywords
2011
INTERNATIONAL ARAB JOURNAL OF INFORMATION TECHNOLOGY
Word alignment, sentence alignment, parallel corpora, statistical natural language processing
Field
DocType
Volume
Arabic,Computer science,Parallel corpora,Basic block,Preprocessor,Natural language processing,Artificial intelligence,Sentence,Language model,Machine learning
Journal
8
Issue
ISSN
Citations 
2
1683-3198
0
PageRank 
References 
Authors
0.34
10
3
Name
Order
Citations
PageRank
Mohammad Salameh11018.59
Rached Zantout2187.67
Nashat Mansour336045.88