A New Prosody-Assisted Mandarin ASR System - Citegraph

Paper Info

Title
A New Prosody-Assisted Mandarin ASR System

Abstract
This paper presents a new prosody-assisted automatic speech recognition (ASR) system for Mandarin speech. It differs from the conventional approach of using simple prosodic cues on employing a sophisticated prosody modeling approach based on a four-layer prosody-hierarchy structure to automatically generate 12 prosodic models from a large unlabeled speech database by the joint prosody labeling and modeling (PLM) algorithm proposed previously. By incorporating these 12 prosodic models into a two-stage ASR system to rescore the word lattice generated in the first stage by the conventional hidden Markov model (HMM) recognizer, we can obtain a better recognized word string. Besides, some other information can also be decoded, including part of speech (POS), punctuation mark (PM), and two types of prosodic tags which can be used to construct the prosody-hierarchy structure of the testing speech. Experimental results on the TCC300 database, which consists of long paragraphic utterances, showed that the proposed system significantly outperformed the baseline scheme using an HMM recognizer with a factored language model which models word, POS, and PM. Performances of 20.7%, 14.4%, and 9.6% in word, character, and base-syllable error rates were obtained. They corresponded to 3.7%, 3.7%, and 2.4% absolute (or 15.2%, 20.4%, and 20% relative) error reductions. By an error analysis, we found that many word segmentation errors and tone recognition errors were corrected.

Year	DOI	Venue
2012	10.1109/TASL.2012.2187192	IEEE Transactions on Audio, Speech & Language Processing
Keywords	Field	DocType
prosody modeling,prosody labeling and modeling algorithm,prosody-assisted automatic speech recognition (asr),speech recognition,error analysis,tcc300 database,prosody-assisted mandarin asr system,mandarin speech,part of speech,word segmentation errors,prosody-hierarchy structure,punctuation mark,hidden markov model,hmm recognizer,automatic speech recognition system,hidden markov models,tone recognition errors,relative error,automatic speech recognition,word segmentation,language model	Prosody,Factored language model,Computer science,Tone recognition,Speech recognition,Text segmentation,Part of speech,Natural language processing,Artificial intelligence,Hidden Markov model,Mandarin Chinese	Journal
Volume	Issue	ISSN
20	6	1558-7916
Citations	PageRank	References
5	0.59	13
Authors
5

Authors (5 rows)

Cited by (5 rows)

References (13 rows)

Name	Order	Citations	PageRank
Sin-Horng Chen	1	273	39.86
Jyh-Her Yang	2	9	1.68
Chen-Yu Chiang	3	31	11.55
Ming-Chieh Liu	4	7	1.28
Yih-Ru Wang	5	237	34.68

1