Title
Evaluating Acoustic Feature Maps in 2D-CNN for Speaker Identification
Abstract
This paper presents a study evaluating different acoustic feature map representations in two-dimensional convolutional neural networks (2D-CNN) on the speech dataset for various speech-related activities. Specifically, the task involves identifying useful 2D-CNN input feature maps for enhancing speaker identification with an ultimate goal to improve speaker authentication and enabling voice as a biometric feature. Voice in contrast to fingerprints and image-based biometrics is a natural choice for hands-free communication systems where touch interfaces are inconvenient or dangerous to use. Effective input feature map representation may help CNN exploit intrinsic voice features that not only can address the instability issues of voice as an identifier for textindependent speaker authentication while preserving privacy but can also assist in developing efficacious voice-enabled interfaces. Three different acoustic features with three possible feature map representations are evaluated in this study. Results obtained on three speech corpora shows that an interpolated baseline spectrogram performs best compared to Mel frequency spectral coefficients (MFSC) and Mel frequency cepstral coefficient (MFCC) when tested on a 5-fold cross-validation method using 2D-CNN. On both textdependent and text-independent datasets, raw spectrogram accuracy is 4% better than the traditional acoustic features.
Year
DOI
Venue
2019
10.1145/3318299.3318386
Proceedings of the 2019 11th International Conference on Machine Learning and Computing
Keywords
DocType
ISBN
CNN, MFCC, MFSC, Speaker identification, acoustic features, deep learning, machine learning, speaker classification, spectrogram
Conference
978-1-4503-6600-7
Citations 
PageRank 
References 
0
0.34
0
Authors
5
Name
Order
Citations
PageRank
Ali Shariq Imran14917.47
Vetle Haflan200.34
Abdolreza Sabzi Shahrebabaki313.41
Negar Olfati412.40
Torbjørn Svendsen516121.26