Abstract | ||
---|---|---|
Multilingual natural language processing systems are increasingly relying on parallel corpus to ameliorate their output. Parallel cot-pot-a constitute the basic block for training a statistical natural language processing system and creating translation and language models. Several systems have been devised that automatically align words of a pair of sentences, each in a language. Such systems have been used successfully with European languages. In this paper, one such system is used to align sentences in an English-Arabic corpus. The system works poorly given raw unaligned sentence English-Arabic sentence pairs. This prompted the development of a preprocessing step to be applied to the Arabic sentences. The same corpus was then preprocessed and a significant improvement is reported when alignment is attempted using the preprocessed unaligned sentences. |
Year | Venue | Keywords |
---|---|---|
2011 | INTERNATIONAL ARAB JOURNAL OF INFORMATION TECHNOLOGY | Word alignment, sentence alignment, parallel corpora, statistical natural language processing |
Field | DocType | Volume |
Arabic,Computer science,Parallel corpora,Basic block,Preprocessor,Natural language processing,Artificial intelligence,Sentence,Language model,Machine learning | Journal | 8 |
Issue | ISSN | Citations |
2 | 1683-3198 | 0 |
PageRank | References | Authors |
0.34 | 10 | 3 |
Name | Order | Citations | PageRank |
---|---|---|---|
Mohammad Salameh | 1 | 101 | 8.59 |
Rached Zantout | 2 | 18 | 7.67 |
Nashat Mansour | 3 | 360 | 45.88 |