Title
Deep Neural Networks with Batch Speaker Normalization for Intoxicated Speech Detection
Abstract
Alcohol intoxication can affect people both physically and psychologically, and one's speech will also become different. However, detecting the intoxicated state from the speech is a challenging task. In this paper, we first implement the baseline model with ComParE feature and then explore the influence of the speaker information on the intoxication detection task. Besides, we apply a ResNet18 based model to this task. The model contains three parts: a representation learning sub-network with Deep Residual Neural Network(ResNet) of 18-layer, a global average pooling(GAP) layer and a classifier of 2 fully connected layers. Since we cannot perform speaker z-normalization on the variant-length feature input, we employ the batch z-normalization to train the proposed model. It also achieves similar improvement like applying the speaker normalization to the baseline method. Experimental results show that speaker normalization on baseline model and batch z-normalization on ResNet18 based model provides 4.9% and 3.8% improvement respectively. The results show that speaker normalization can improve the performance of both the baseline model and the proposed model.
Year
DOI
Venue
2019
10.1109/APSIPAASC47483.2019.9023074
Asia-Pacific Signal and Information Processing Association Annual Summit and Conference
Keywords
DocType
ISSN
intoxicated speech detection,Convolutional Neural Network,computational paralinguistics
Conference
2309-9402
Citations 
PageRank 
References 
0
0.34
0
Authors
3
Name
Order
Citations
PageRank
Weiqing Wang103.04
Haiwei Wu211.70
Ming Li333117.67