Title
ASCEND: A Spontaneous Chinese-English Dataset for Code-switching in Multi-turn Conversation.
Abstract
Code-switching is a speech phenomenon occurring when a speaker switches language during a conversation. Despite the spontaneous nature of code-switching in conversational spoken language, most existing works collect code-switching data from read speech instead of spontaneous speech. ASCEND (A Spontaneous Chinese-English Dataset) is a high-quality Mandarin Chinese-English code-switching corpus built on spontaneous multi-turn conversational dialogue sources collected in Hong Kong. We report ASCEND's design and procedure for collecting the speech data, including annotations. ASCEND consists of 10.62 hours of clean speech, collected from 23 bilingual speakers of Chinese and English. Furthermore, we conduct baseline experiments using pre-trained wav2vec 2.0 models, achieving a best performance of 22.69\% character error rate and 27.05% mixed error rate.
Year
Venue
DocType
2022
International Conference on Language Resources and Evaluation (LREC)
Conference
Citations 
PageRank 
References 
0
0.34
0
Authors
14
Name
Order
Citations
PageRank
Holy Lovenia101.01
Samuel Cahyawijaya203.38
Genta Indra Winata301.35
Peng Xu47515.22
Xu Yan501.69
Zihan Liu639.84
Rita Frieske701.01
Tiezheng Yu801.35
Wenliang Dai901.35
Elham J. Barezi1001.01
Qifeng Chen1100.68
Xiaojuan Ma1286.00
Bertram E. Shi1339656.91
Pascale Fung1467885.84