Title
Local context selection for aligning sentences in parallel corpora
Abstract
This paper presents a novel language-independent context-based sentence alignment technique given parallel corpora. We can view the problem of aligning sentences as finding translations of sentences chosen from different sources. Unlike current approaches which rely on pre-defined features and models, our algorithm employs features derived from the distributional properties of sentences and does not use any language dependent knowledge. We make use of the context of sentences and introduce the notion of Zipfian word vectors which effectively models the distributional properties of a given sentence.We accept the context to be the frame in which the reasoning about sentence alignment is done. We examine alternatives for local context models and demonstrate that our context based sentence alignment algorithm performs better than prominent sentence alignment techniques. Our system dynamically selects the local context for a pair of set of sentences which maximizes the correlation.We evaluate the performance of our system based on two different measures: sentence alignment accuracy and sentence alignment coverage. We compare the performance of our system with commonly used sentence alignment systems and show that our system performs 1.1951 to 1.5404 times better in reducing the error rate in alignment accuracy and coverage.
Year
DOI
Venue
2007
10.1007/978-3-540-74255-5_7
CONTEXT
Keywords
Field
DocType
sentence alignment,alignment technique,alignment accuracy,aligning sentence,sentence alignment algorithm,parallel corpus,sentence alignment coverage,novel language-independent context-based sentence,distributional property,local context selection,prominent sentence alignment technique,sentence alignment accuracy,error rate,system dynamics,system performance,context model
Computer science,Context based,Word error rate,Machine translation,Parallel corpora,Speech recognition,Artificial intelligence,Natural language processing,Sentence,Word-sense disambiguation
Conference
Volume
ISSN
Citations 
4635
0302-9743
1
PageRank 
References 
Authors
0.37
12
1
Name
Order
Citations
PageRank
Ergun Biçici113313.23