Title
The NAIST-NTT TED talk treebank.
Abstract
Syntactic parsing is a fundamental natural language processing technology that has proven useful in machine translation, language modeling, sentence segmentation, and a number of other applications related to speech translation. However, there is a paucity of manually annotated syntactic parsing resources for speech, and particularly for the lecture speech that is the current target of the IWSLT translation campaign. In this work, we present a new manually annotated treebank of TED talks that we hope will prove useful for investigation into the interaction between syntax and these speechrelated applications. The first version of the corpus includes 1,217 sentences and 23,158 words manually annotated with parse trees, and aligned with translations in 26-43 different languages. In this paper we describe the collection of the corpus, and an analysis of its various characteristics.
Year
Venue
DocType
2014
IWSLT
Conference
Citations 
PageRank 
References 
0
0.34
0
Authors
6
Name
Order
Citations
PageRank
Graham Neubig1989130.31
Katsuhito Sudoh232634.44
Yusuke Oda300.34
Kevin Duh481972.94
Hajime Tsukada544929.46
Masaaki Nagata657377.86