Title
A Compact Representation for Cross-Domain Short Text Clustering.
Abstract
Nowadays, Twitter depicts a rich source of on-line reviews, ratings, recommendations, and other forms of opinion expressions. This scenario has created the compelling demand to develop innovative mechanisms to store, search, organize and analyze all this data automatically. Unfortunately, it is seldom available to have enough labeled data in Twitter, because of the cost of the process or due to the impossibility to obtain them, given the rapid growing and change of this kind of media. To avoid such limitations, unsupervised categorization strategies are employed. In this paper we face the problem of cross-domain short text clustering through a compact representation that allows us to avoid the problems that arise with the high dimensionality and sparseness of vocabulary. Our experiments, conducted on a cross-domain scenario using very short texts, indicate that the proposed representation allows to generate high quality groups, according to the value of Silhouette coefficient obtained.
Year
DOI
Venue
2016
10.1007/978-3-319-62434-1_2
ADVANCES IN COMPUTATIONAL INTELLIGENCE, MICAI 2016, PT I
Keywords
Field
DocType
Short text clustering,Unsupervised categorization,Crossdomain clustering,Compact text representation,Silhouette coefficient
Categorization,Pattern recognition,Expression (mathematics),Document clustering,Silhouette,Computer science,Curse of dimensionality,Impossibility,Artificial intelligence,Brown clustering,Vocabulary,Machine learning
Conference
Volume
ISSN
Citations 
10061
0302-9743
0
PageRank 
References 
Authors
0.34
0
4