Title | ||
---|---|---|
BoWT: A Hybrid Text Representation Model for Improving Text Categorization Based on AdaBoost.MH. |
Abstract | ||
---|---|---|
Text representation is the fundamental task in text categorization system. The BAG-OF-WORDS (BOW) is a typical model for representing the texts into vectors of single words. Even though it is a simple representation model, BOW has been criticized for its disregard of the relationships between the words. Alternatively, the Latent Dirichlet Allocation (LDA) topic model has been proposed to represent the texts into a BAG-OF-TopICS (BOT). In LDA, the words in the corpus are statistically grouped into a small number of themes called "latent topics" in which the topics capture the semantic relationships between the words. Thus, representing the documents using BOT will dramatically accelerate the training time; as well improve the classification performance. However, BOT has been proven to not be effective for imbalanced datasets. Accordingly, this paper presents a hybrid text representation model as a combination of BOW and BOT, namely BOWT. In BOWT, the high weighted BOW's features are merged with the BOT's features to produce a new feature space. The proposed representation model BOWT is evaluated for multi-label text categorization based on the well-known boosting algorithm ADABOOST.MH. The experimental results on four benchmarks demonstrated that the BOWT representation model notably outperforms both BOW and BOT and dramatically improves the classification performance of ADABOOST.MH for text categorization. |
Year | DOI | Venue |
---|---|---|
2016 | 10.1007/978-3-319-49397-8_1 | Lecture Notes in Computer Science |
Keywords | Field | DocType |
Text representation,BOwt,Text categorization,ADAbOOST.MH,Topic modeling | Text graph,Text mining,Latent Dirichlet allocation,Feature vector,AdaBoost,Computer science,Artificial intelligence,Boosting (machine learning),Natural language processing,Topic model,Text categorization | Conference |
Volume | ISSN | Citations |
10053 | 0302-9743 | 1 |
PageRank | References | Authors |
0.35 | 10 | 3 |
Name | Order | Citations | PageRank |
---|---|---|---|
Bassam Al-Salemi | 1 | 29 | 2.50 |
Mohd Juzaiddin Ab | 2 | 71 | 9.26 |
Shahrul Azman Noah | 3 | 104 | 26.70 |