Title | ||
---|---|---|
A LDA-BASED TOPIC CLASSIFICATION APPROACH FROM HIGHLY IMPERFECT AUTOMATIC TRANSCRIPTIONS. |
Abstract | ||
---|---|---|
Although the current transcription systems could achieve high recognition performance, they still have a lot of difficulties to transcribe speech in very noisy environments. The transcription quality has a direct impact on classification tasks using text features. In this paper, we propose to identify themes of telephone conversation services with the classical Term Frequency-Inverse Document Frequency using Gini purity criteria (TF-IDF-Gini) method and with a Latent Dirichlet Allocation (LDA) approach. These approaches are coupled with a Support Vector Machine (SVM) classification to resolve theme identification problem. Results show the effectiveness of the proposed LDA-based method compared to the classical TF-IDF-Gini approach in the context of highly imperfect automatic transcriptions. Finally, we discuss the impact of discriminative and non-discriminative words extracted by both methods in terms of transcription accuracy. |
Year | Venue | Keywords |
---|---|---|
2014 | LREC 2014 - NINTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION | Speech analytics,Topic identification,Latent Dirichlet Allocation |
Field | DocType | Citations |
Transcription (linguistics),Latent Dirichlet allocation,Speech analytics,Conversation,Imperfect,Computer science,Support vector machine,Speech recognition,Artificial intelligence,Natural language processing,Discriminative model,Parameter identification problem | Conference | 1 |
PageRank | References | Authors |
0.35 | 17 | 3 |
Name | Order | Citations | PageRank |
---|---|---|---|
Mohamed Morchid | 1 | 84 | 22.79 |
richard dufour | 2 | 98 | 23.98 |
Georges Linares | 3 | 87 | 19.73 |