Title
From Image to Translation: Processing the Endangered Nyushu Script.
Abstract
The lack of computational support has significantly slowed down automatic understanding of endangered languages. In this paper, we take Nyushu (simplified Chinese: 女书; literally: “women’s writing”) as a case study to present the first computational approach that combines Computer Vision and Natural Language Processing techniques to deeply understand an endangered language. We developed an end-to-end system to read a scanned hand-written Nyushu article, segment it into characters, link them to standard characters, and then translate the article into Mandarin Chinese. We propose several novel methods to address the new challenges introduced by noisy input and low resources, including Nyushu-specific feature selection for character segmentation and linking, and character linking lattice based Machine Translation. The end-to-end system performance indicates that the system is a promising approach and can serve as a standard benchmark.
Year
DOI
Venue
2016
10.1145/2857052
ACM Trans. Asian & Low-Resource Lang. Inf. Process.
Keywords
Field
DocType
Endangered languages,nyushu,recognition,translation,Endangered languages,nyushu,recognition,translation
Endangered species,Feature selection,Computer science,Segmentation,Machine translation,Endangered language,Speech recognition,Natural language processing,Artificial intelligence,Mandarin Chinese
Journal
Volume
Issue
ISSN
15
4
2375-4699
Citations 
PageRank 
References 
1
0.37
18
Authors
8
Name
Order
Citations
PageRank
Tongtao Zhang110.37
Aritra Chowdhury221.05
Nimit Dhulekar3143.08
Jinjing Xia410.37
Kevin Knight55096462.44
Heng Ji61544127.27
Bülent Yener7107594.51
Liming Zhao810.37