Title
Detecting Unipolar and Bipolar Depressive Disorders from Elicited Speech Responses Using Latent Affective Structure Model
Abstract
Mood disorders, including unipolar depression (UD) and bipolar disorder (BD) [1] , are reported to be one of the most common mental illnesses in recent years. In diagnostic evaluation on the outpatients with mood disorder, a large portion of BD patients are initially misdiagnosed as having UD [2] . As most previous research focused on long-term monitoring of mood disorders, short-term detection which could be used in early detection and intervention is thus desirable. This work proposes an approach to short-term detection of mood disorder based on the patterns in emotion of elicited speech responses. To the best of our knowledge, there is no database for short-term detection on the discrimination between BD and UD currently. This work collected two databases containing an emotional database (MHMC-EM) collected by the Multimedia Human Machine Communication (MHMC) lab and a mood disorder database (CHI-MEI) collected by the CHI-MEI Medical Center, Taiwan. As the collected CHI-MEI mood disorder database is quite small and emotion annotation is difficult, the MHMC-EM emotional database is selected as a reference database for data adaptation. For the CHI-MEI mood disorder data collection, six eliciting emotional videos are selected and used to elicit the participants' emotions. After watching each of the six eliciting emotional video clips, the participants answer the questions raised by the clinician. The speech responses are then used to construct the CHI-MEI mood disorder database. Hierarchical spectral clustering is used to adapt the collected MHMC-EM emotional database to fit the CHI-MEI mood disorder database for dealing with the data bias problem. The adapted MHMC-EM emotional data are then fed to a denoising autoencoder for bottleneck feature extraction. The bottleneck features are used to construct a long short term memory (LSTM)-based emotion detector for generation of emotion profiles from each speech response. The emotion profiles are then clustered into emotion codewords using the K-means algorithm. Finally, a class-specific latent affective structure model (LASM) is proposed to model the structural relationships among the emotion codewords with respect to six emotional videos for mood disorder detection. Leave-one-group-out cross validation scheme was employed for the evaluation of the proposed class-specific LASM-based approaches. Experimental results show that the proposed class-specific LASM-based method achieved an accuracy of 73.33 percent for mood disorder detection, outperforming the classifiers based on SVM and LSTM.
Year
DOI
Venue
2020
10.1109/TAFFC.2018.2803178
IEEE Transactions on Affective Computing
Keywords
DocType
Volume
Mood disorder,speech emotion recognition,latent affective structure model
Journal
11
Issue
ISSN
Citations 
3
1949-3045
1
PageRank 
References 
Authors
0.34
4
4
Name
Order
Citations
PageRank
Kun-Yi Huang1145.00
Chung-Hsien Wu21099116.79
Ming-Hsiang Su3216.83
Yu-Ting Kuo441.43