Title
Document clustering method using dimension reduction and support vector clustering to overcome sparseness
Abstract
Many studies on developing technologies have been published as articles, papers, or patents. We use and analyze these documents to find scientific and technological trends. In this paper, we consider document clustering as a method of document data analysis. In general, we have trouble analyzing documents directly because document data are not suitable for statistical and machine learning methods of analysis. Therefore, we have to transform document data into structured data for analytical purposes. For this process, we use text mining techniques. The structured data are very sparse, and hence, it is difficult to analyze them. This study proposes a new method to overcome the sparsity problem of document clustering. We build a combined clustering method using dimension reduction and K-means clustering based on support vector clustering and Silhouette measure. In particular, we attempt to overcome the sparseness in patent document clustering. To verify the efficacy of our work, we first conduct an experiment using news data from the machine learning repository of the University of California at Irvine. Second, using patent documents retrieved from the United States Patent and Trademark Office, we carry out patent clustering for technology forecasting.
Year
DOI
Venue
2014
10.1016/j.eswa.2013.11.018
Expert Syst. Appl.
Keywords
Field
DocType
document clustering,combined clustering method,support vector clustering,dimension reduction,document data,document data analysis,new method,structured data,news data,patent document,patent document clustering
Data mining,Canopy clustering algorithm,Fuzzy clustering,Clustering high-dimensional data,Data stream clustering,Correlation clustering,Computer science,Document clustering,Artificial intelligence,Conceptual clustering,Cluster analysis,Machine learning
Journal
Volume
Issue
ISSN
41
7
0957-4174
Citations 
PageRank 
References 
24
0.77
25
Authors
3
Name
Order
Citations
PageRank
Sung-Hae Jun19511.79
Sang-Sung Park2807.25
Dong-Sik Jang319613.81