Iterative Term Weighting For Short Text Data - Citegraph

Paper Info

Title
Iterative Term Weighting For Short Text Data

Abstract
With the development of social media applications, short text mining is becoming more and more important. Due to the sparseness of short text data, both the feature correlation information (word co-occurrence) and data contiguity information (context information) are less reliable, thus most existing text mining methods which are designed to address regular text data are less efficient in short text mining tasks. According to our observation from analysis of discriminative term distribution in short text data, we found that discriminative terms distribute in a non-uniform way among different domains, while background words have a tendency to distribute uniformly. This observation can be measured by a suitably defined functional of a term's probability distribution over different domains. In this paper, we adopt this distribution as the weight of terms to address the sparseness problem of short text data. We evaluate our method on two datasets, and experimental results show that our method outperforms previous approaches which require information infusion, and a number of state-of-the-art clustering algorithms. Furthermore, our method can obtain a more coherent clustering result.

Year	DOI	Venue
2015	10.1109/SMC.2015.297	2015 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN, AND CYBERNETICS (SMC 2015): BIG DATA ANALYTICS FOR HUMAN-CENTRIC SYSTEMS
Keywords	Field	DocType
Short Text, Subspace Clustering, Feature Selection, Term Weighting	Text graph,Data mining,Contiguity,Weighting,Text mining,Computer science,Probability distribution,Artificial intelligence,Cluster analysis,Discriminative model,Semantics,Machine learning	Conference
ISSN	Citations	PageRank
1062-922X	0	0.34
References	Authors
16	3

Authors (3 rows)

Cited by (0 rows)

References (16 rows)

Name	Order	Citations	PageRank
Chutao Zheng	1	9	1.90
Cheng Liu	2	33	5.72
Hau-San Wong	3	1008	86.89

1