Title
Development of a Chinese telephony conversational corpus for speech processing [speech recognition applications]
Abstract
This paper describes the development of the EARS (effective, affordable, reusable speech-to-text) Chinese corpus, a telephony conversational speech database for speech processing. The EARS database is the first of its kind collected for Mandarin Chinese telephony spontaneous speech. The purpose of developing this EARS Chinese corpus is to collect Mandarin conversations between either strangers or friends, which cover a wide range of topics, over landline and cellular channels. All the speech data are annotated with standard Chinese character transcription as well as specific mark-ups for spontaneous speech. This corpus will be used for conversational and spontaneous Mandarin speech recognition tasks, under the DARPA EARS framework. This paper introduces the design, development, structure, and initial phonetic analysis of the first 50-hour collection of this corpus. Additional 300 to 500 hours of data will be collected and transcribed between 2004 and 2005.
Year
DOI
Venue
2004
10.1109/CHINSL.2004.1409620
ISCSLP
Keywords
Field
DocType
audio databases,natural languages,speech processing,speech recognition,chinese character transcription,chinese telephony conversational corpus,chinese telephony conversational speech database,darpa ears framework,ears chinese corpus,mandarin chinese telephony spontaneous speech,mandarin conversations,annotated speech data,conversational mandarin speech recognition,phonetic analysis,spontaneous mandarin speech recognition,spontaneous speech mark-ups,mandarin chinese,speech to text
Speech corpus,Speech processing,Speech synthesis,Speech analytics,Computer science,Audio mining,Chinese speech synthesis,Speech recognition,Natural language processing,Artificial intelligence,VoxForge,Speech technology
Conference
ISBN
Citations 
PageRank 
0-7803-8678-7
0
0.34
References 
Authors
3
6
Name
Order
Citations
PageRank
Yi Liu18414.95
Pascale Fung2653135.24
Shilei Huang363.26
Christopher Cieri412342.44
z lufeng5201.21
c benfeng600.34