Building a Conversation Corpus by Text Derivation from "Germ Dialogs - Citegraph

Paper Info

Title
Building a Conversation Corpus by Text Derivation from "Germ Dialogs

Abstract
We propose a method for building a spoken-language text corpus for a spoken- language system. Conventional methods to build a new corpus include transcribing re- corded conversations, collecting text from existing documents, or writing original texts. However, these often have difficulties, such as insufficient corpus size and low cost effec- tiveness, when preparing the text data in the applied system's domain. To address these is- sues, we have developed a method that uses "germ dialogs," which are short-scripted dia- logs that enable writers to continue or replace them in a logical sequence that sounds natu- ral. This enables the corpus size to be proliferated in a cost-effective manner. Our results show that the proposed method can be used to create higher degree of adequateness for the system's domain than conventional methods. The text data collected for the proposed method are used to generate a language model for our speech translation system between English and Japanese.

Year	Venue	Keywords
2005	EAMT	language model,cost effectiveness,data collection
DocType	Citations	PageRank
Conference	0	0.34
References	Authors
2	5

Authors (5 rows)

Cited by (0 rows)

References (2 rows)

Name	Order	Citations	PageRank
naoki asanoma	1	0	0.68
Setsuo Yamada	2	63	15.78
Osamu Furuse	3	171	31.55
Masahiro Oku	4	32	5.91
ntt cyber	5	6	1.62

1