Title
Data augmentation and feature extraction using variational autoencoder for acoustic modeling.
Abstract
A data augmentation and feature extraction method using a variational autoencoder (VAE) for acoustic modeling is described. A VAE is a generative model based on variational Bayesian learning using a deep learning framework. A VAE can extract latent values its input variables to generate new information. VAEs are widely used to generate pictures and sentences. In this paper, a VAE is applied to speech corpus data augmentation and feature vector extraction from speech for acoustic modeling. First, the size of a speech corpus is doubled by encoding latent variables extracted from original utterances using a VAE, framework. The latent variables extracted from speech waveforms have latent "meanings" of the waveforms. Therefore, latent variables can be used as acoustic features for automatic speech recognition (ASR). This paper experimentally shows the effectiveness of data augmentation using a VAE, framework and that latent variable-based features can be utilized in ASR.
Year
Venue
Field
2017
Asia-Pacific Signal and Information Processing Association Annual Summit and Conference
Speech corpus,Feature vector,Autoencoder,Bayesian inference,Pattern recognition,Computer science,Latent variable,Feature extraction,Artificial intelligence,Deep learning,Generative model
DocType
ISSN
Citations 
Conference
2309-9402
0
PageRank 
References 
Authors
0.34
0
1
Name
Order
Citations
PageRank
Hiromitsu Nishizaki116329.49