Title
A LDA-BASED TOPIC CLASSIFICATION APPROACH FROM HIGHLY IMPERFECT AUTOMATIC TRANSCRIPTIONS.
Abstract
Although the current transcription systems could achieve high recognition performance, they still have a lot of difficulties to transcribe speech in very noisy environments. The transcription quality has a direct impact on classification tasks using text features. In this paper, we propose to identify themes of telephone conversation services with the classical Term Frequency-Inverse Document Frequency using Gini purity criteria (TF-IDF-Gini) method and with a Latent Dirichlet Allocation (LDA) approach. These approaches are coupled with a Support Vector Machine (SVM) classification to resolve theme identification problem. Results show the effectiveness of the proposed LDA-based method compared to the classical TF-IDF-Gini approach in the context of highly imperfect automatic transcriptions. Finally, we discuss the impact of discriminative and non-discriminative words extracted by both methods in terms of transcription accuracy.
Year
Venue
Keywords
2014
LREC 2014 - NINTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION
Speech analytics,Topic identification,Latent Dirichlet Allocation
Field
DocType
Citations 
Transcription (linguistics),Latent Dirichlet allocation,Speech analytics,Conversation,Imperfect,Computer science,Support vector machine,Speech recognition,Artificial intelligence,Natural language processing,Discriminative model,Parameter identification problem
Conference
1
PageRank 
References 
Authors
0.35
17
3
Name
Order
Citations
PageRank
Mohamed Morchid18422.79
richard dufour29823.98
Georges Linares38719.73