Title
Building a Conversation Corpus by Text Derivation from "Germ Dialogs
Abstract
We propose a method for building a spoken-language text corpus for a spoken- language system. Conventional methods to build a new corpus include transcribing re- corded conversations, collecting text from existing documents, or writing original texts. However, these often have difficulties, such as insufficient corpus size and low cost effec- tiveness, when preparing the text data in the applied system's domain. To address these is- sues, we have developed a method that uses "germ dialogs," which are short-scripted dia- logs that enable writers to continue or replace them in a logical sequence that sounds natu- ral. This enables the corpus size to be proliferated in a cost-effective manner. Our results show that the proposed method can be used to create higher degree of adequateness for the system's domain than conventional methods. The text data collected for the proposed method are used to generate a language model for our speech translation system between English and Japanese.
Year
Venue
Keywords
2005
EAMT
language model,cost effectiveness,data collection
DocType
Citations 
PageRank 
Conference
0
0.34
References 
Authors
2
5
Name
Order
Citations
PageRank
naoki asanoma100.68
Setsuo Yamada26315.78
Osamu Furuse317131.55
Masahiro Oku4325.91
ntt cyber561.62