Title
BoWT: A Hybrid Text Representation Model for Improving Text Categorization Based on AdaBoost.MH.
Abstract
Text representation is the fundamental task in text categorization system. The BAG-OF-WORDS (BOW) is a typical model for representing the texts into vectors of single words. Even though it is a simple representation model, BOW has been criticized for its disregard of the relationships between the words. Alternatively, the Latent Dirichlet Allocation (LDA) topic model has been proposed to represent the texts into a BAG-OF-TopICS (BOT). In LDA, the words in the corpus are statistically grouped into a small number of themes called "latent topics" in which the topics capture the semantic relationships between the words. Thus, representing the documents using BOT will dramatically accelerate the training time; as well improve the classification performance. However, BOT has been proven to not be effective for imbalanced datasets. Accordingly, this paper presents a hybrid text representation model as a combination of BOW and BOT, namely BOWT. In BOWT, the high weighted BOW's features are merged with the BOT's features to produce a new feature space. The proposed representation model BOWT is evaluated for multi-label text categorization based on the well-known boosting algorithm ADABOOST.MH. The experimental results on four benchmarks demonstrated that the BOWT representation model notably outperforms both BOW and BOT and dramatically improves the classification performance of ADABOOST.MH for text categorization.
Year
DOI
Venue
2016
10.1007/978-3-319-49397-8_1
Lecture Notes in Computer Science
Keywords
Field
DocType
Text representation,BOwt,Text categorization,ADAbOOST.MH,Topic modeling
Text graph,Text mining,Latent Dirichlet allocation,Feature vector,AdaBoost,Computer science,Artificial intelligence,Boosting (machine learning),Natural language processing,Topic model,Text categorization
Conference
Volume
ISSN
Citations 
10053
0302-9743
1
PageRank 
References 
Authors
0.35
10
3
Name
Order
Citations
PageRank
Bassam Al-Salemi1292.50
Mohd Juzaiddin Ab2719.26
Shahrul Azman Noah310426.70