Abstract | ||
---|---|---|
Supervised topic modeling algorithms have been successfully applied to multi-label document classification tasks. Representative models include labeled latent Dirichlet allocation (L-LDA) and dependency-LDA. However, these models neglect the class frequency information of words (i.e., the number of classes where a word has occurred in the training data), which is significant for classification. To address this, we propose a method, namely the class frequency weight (CF-weight), to weight words by considering the class frequency knowledge. This CF-weight is based on the intuition that a word with higher (lower) class frequency will be less (more) discriminative. In this study, the CF-weight is used to improve L-LDA and dependency-LDA. A number of experiments have been conducted on real-world multi-label datasets. Experimental results demonstrate that CF-weight based algorithms are competitive with the existing supervised topic models. |
Year | DOI | Venue |
---|---|---|
2018 | 10.1631/FITEE.1601668 | Frontiers of IT & EE |
Keywords | Field | DocType |
Supervised topic model, Multi-label classification, Class frequency, Labeled latent Dirichlet allocation (L-LDA), Dependency-LDA, TP391 | Document classification,Training set,Latent Dirichlet allocation,Mathematical optimization,Computer science,Intuition,Multi-label classification,Artificial intelligence,Topic model,Discriminative model,Machine learning | Journal |
Volume | Issue | ISSN |
19 | 4 | 2095-9184 |
Citations | PageRank | References |
0 | 0.34 | 0 |
Authors | ||
3 |
Name | Order | Citations | PageRank |
---|---|---|---|
Yue-peng Zou | 1 | 0 | 0.34 |
Jihong OuYang | 2 | 94 | 15.66 |
Ximing Li | 3 | 11 | 5.37 |