Title
Improving sentence-level alignment of speech with imperfect transcripts using utterance concatenation and VAD
Abstract
Preparing data for speech processing applications is in general a task which requires expert knowledge and takes up a large amount of time. Therefore, being able to automate as much as possible this process can have a significant impact on the expansion of the number of languages for which spoken interaction with the machines is available. In this paper we build upon a previously developed tool, ALISA, which was developed to align speech with imperfect transcripts using only 10 minutes of manually labelled data, in any alphabetic language. Although its error rate is around 0.6% at word-level, we noticed that the sentence-level accuracy is drastically affected by a large number of sentence-initial word deletions. To overcome this problem, we propose two methods: one based on utterance concatenation, and one based on voice activity detection (VAD). The results show that these simple methods can achieve around 10% relative improvement over the baseline results.
Year
DOI
Venue
2016
10.1109/ICCP.2016.7737141
2016 IEEE 12th International Conference on Intelligent Computer Communication and Processing (ICCP)
Keywords
Field
DocType
speech and text alignment,VAD,imperfect transcripts,utterance concatenation,ALISA
Data modeling,Speech processing,Computer science,Voice activity detection,Word error rate,Utterance,Speech recognition,Artificial intelligence,Concatenation,Natural language processing,Hidden Markov model,Sentence
Conference
ISSN
ISBN
Citations 
2065-9946
978-1-5090-3900-5
0
PageRank 
References 
Authors
0.34
5
3
Name
Order
Citations
PageRank
Alexandru Moldovan100.34
Adriana Stan2367.23
Mircea Giurgiu3115.19