Title
Text feature selection with a robust weight scheme and dynamic dimension reduction to text document clustering
Abstract
Three meta-heuristic algorithms are adapted to solve the feature selection problem.Feature selection methods are established based on a novel weighting scheme.Dimension reduction technique is proposed to reduce the number of features.K-mean clustering algorithm is used based on the features obtained.The proposed methods outperform the comparative methods. This paper proposes three feature selection algorithms with feature weight scheme and dynamic dimension reduction for the text document clustering problem. Text document clustering is a new trend in text mining; in this process, text documents are separated into several coherent clusters according to carefully selected informative features by using proper evaluation function, which usually depends on term frequency. Informative features in each document are selected using feature selection methods. Genetic algorithm (GA), harmony search (HS) algorithm, and particle swarm optimization (PSO) algorithm are the most successful feature selection methods established using a novel weighting scheme, namely, length feature weight (LFW), which depends on term frequency and appearance of features in other documents. A new dynamic dimension reduction (DDR) method is also provided to reduce the number of features used in clustering and thus improve the performance of the algorithms. Finally, k-mean, which is a popular clustering method, is used to cluster the set of text documents based on the terms (or features) obtained by dynamic reduction. Seven text mining benchmark text datasets of different sizes and complexities are evaluated. Analysis with k-mean shows that particle swarm optimization with length feature weight and dynamic reduction produces the optimal outcomes for almost all datasets tested. This paper provides new alternatives for text mining community to cluster text documents by using cohesive and informative features.
Year
DOI
Venue
2017
10.1016/j.eswa.2017.05.002
Expert Syst. Appl.
Keywords
Field
DocType
Feature selection,Dynamic dimension reduction,Text document clustering,Weight score,Metaheuristics
Data mining,Weighting,Dimensionality reduction,Feature selection,Computer science,Artificial intelligence,Cluster analysis,Genetic algorithm,Metaheuristic,Particle swarm optimization,Pattern recognition,Correlation clustering,Machine learning
Journal
Volume
Issue
ISSN
84
C
0957-4174
Citations 
PageRank 
References 
37
0.98
23
Authors
4
Name
Order
Citations
PageRank
Laith Mohammad Abualigah124411.47
Ahamad Tajudin Khader268340.71
Mohammed Azmi Al-Betar362043.69
Osama Ahmad Alomari4411.37