Title
Class-Based language models for chinese-english parallel corpus
Abstract
This paper addresses using novel class-based language models on parallel corpora, focusing specifically on English and Chinese languages. We find that the perplexity of Chinese is generally much higher than English and discuss the possible reasons. We demonstrate the relative effectiveness of using class-based models over the modified Kneser-Ney trigram model for our task. We also introduce a rare events clustering and a polynomial discounting mechanism, which is shown to improve results. Our experimental results on parallel corpora indicate that the improvement due to classes are similar for English and Chinese. This suggests that class-based language models should be used for both languages.
Year
DOI
Venue
2013
10.1007/978-3-642-37256-8_22
CICLing (2)
Keywords
Field
DocType
rare event,modified kneser-ney trigram model,novel class-based language model,chinese language,possible reason,parallel corpus,class-based language model,polynomial discounting mechanism,class-based model,chinese-english parallel corpus
Perplexity,Polynomial,Computer science,Trigram,Machine translation,Computational linguistics,Natural language processing,Artificial intelligence,Cluster analysis,Language model,Rare events
Conference
Citations 
PageRank 
References 
0
0.34
14
Authors
4
Name
Order
Citations
PageRank
Junfei Guo173.01
Juan Liu21128145.32
Michael Walsh300.34
Helmut Schmid450539.87