The development and analysis of a Malay broadcasr news corpus - Citegraph

Paper Info

Title
The development and analysis of a Malay broadcasr news corpus

Abstract
This paper presents our effort in collecting a Malay broadcast news (BN) speech corpus to support our research in Malay LVCSR. The 53 hours corpus is recorded from the TV channels in both Singapore and Malaysia over a 9-month period. To facilitate various researches in LVCSR, besides of orthographic transcription, the corpus provides other metadata such as speaking environment type, speaker identity information, language identity, and topic descriptions. In the orthographic transcription, we also tagged various linguistic phenomena such as disfluencies, code switched words, and proper nouns. We trained an ASR system and achieved a word error rate of 8.5% for anchor speech and 17.1% overall (including reporter and other speakers speech) on 27 hours of test data.

Year	DOI	Venue
2013	10.1109/ICSDA.2013.6709862	O-COCOSDA/CASLRE
Keywords	Field	DocType
audio databases,linguistics,meta data,natural languages,speech processing,speech recognition,asr system,bn speech corpus,malay lvcsr,malay broadcast news speech corpus,malaysia,singapore,tv channels,anchor speech,code switched words,language identity,linguistic phenomena,metadata,orthographic transcription,reporter speech,speaker identity information,speaker speech,speaking environment type,topic descriptions,word error rate,malay,speech corpus,broadcast news	Speech corpus,Speech processing,Orthographic transcription,Speech synthesis,Speech analytics,Malay,Computer science,Speech recognition,Artificial intelligence,Natural language processing,VoxForge,Speech technology	Conference
ISSN	Citations	PageRank
2163-3479	0	0.34
References	Authors
2	8

Authors (8 rows)

Cited by (0 rows)

References (2 rows)

Name	Order	Citations	PageRank
Tze Yuang Chong	1	9	3.59
Xiong Xiao	2	281	34.97
haihua xu	3	26	2.72
Tien-Ping Tan	4	25	7.46
pham chaukhoa	5	0	0.34
Dau-Cheng Lyu	6	126	13.54
Eng Siong Chng	7	970	106.33
Haizhou Li	8	3678	334.61

1