The heavy frequency vector-based text clustering - Citegraph

Paper Info

Title
The heavy frequency vector-based text clustering

Abstract
The VSM with TF-IDF is a popular approach to represent a document. But it is not very fit for clustering in a dynamic or changing corpus because we have to update the TF-IDF value of every dimension of every VSM vector when we add a new file into the corpus. Furthermore, popular feature selection methods, such as DF, IG and chi, need some global corpus information before clustering. We present the heavy frequency vector, which considers only the most frequent words in a document. Since an HFV does not contain any global corpus information, it is easy to implement incremental clustering, especially in dynamic or changing corpus. We compare the HFV-based K-means model with the traditional VSM-based K-means model with different feature selection methods. The results show that the HFV model has better precision than others. However, the complexity of HFV model is greater than others.

Year	DOI	Venue
2005	10.1504/IJBIDM.2005.007317	IJBIDM
Keywords	Field	DocType
global corpus information,heavy frequency vector,popular approach,different feature selection method,incremental clustering,vsm vector,hfv-based k-means model,traditional vsm-based k-means model,hfv model,vector-based text clustering,tf-idf value,k means,feature selection,text clustering,word frequency	Data mining,k-means clustering,Feature selection,Pattern recognition,Word lists by frequency,Computer science,Document clustering,Document representation,Artificial intelligence,Dynamic text,Cluster analysis,Text processing	Journal
Volume	Issue	Citations
1	1	4
PageRank	References	Authors
0.46	14	4

Authors (4 rows)

Cited by (4 rows)

References (14 rows)

Name	Order	Citations	PageRank
Junpeng Bao	1	4	1.81
Jun-Yi Shen	2	4	0.46
Xiaodong Liu	3	36	11.83
Hai-Yan Liu	4	12	1.41

1