A Compact Representation for Cross-Domain Short Text Clustering. - Citegraph

Paper Info

Title
A Compact Representation for Cross-Domain Short Text Clustering.

Abstract
Nowadays, Twitter depicts a rich source of on-line reviews, ratings, recommendations, and other forms of opinion expressions. This scenario has created the compelling demand to develop innovative mechanisms to store, search, organize and analyze all this data automatically. Unfortunately, it is seldom available to have enough labeled data in Twitter, because of the cost of the process or due to the impossibility to obtain them, given the rapid growing and change of this kind of media. To avoid such limitations, unsupervised categorization strategies are employed. In this paper we face the problem of cross-domain short text clustering through a compact representation that allows us to avoid the problems that arise with the high dimensionality and sparseness of vocabulary. Our experiments, conducted on a cross-domain scenario using very short texts, indicate that the proposed representation allows to generate high quality groups, according to the value of Silhouette coefficient obtained.

Year	DOI	Venue
2016	10.1007/978-3-319-62434-1_2	ADVANCES IN COMPUTATIONAL INTELLIGENCE, MICAI 2016, PT I
Keywords	Field	DocType
Short text clustering,Unsupervised categorization,Crossdomain clustering,Compact text representation,Silhouette coefficient	Categorization,Pattern recognition,Expression (mathematics),Document clustering,Silhouette,Computer science,Curse of dimensionality,Impossibility,Artificial intelligence,Brown clustering,Vocabulary,Machine learning	Conference
Volume	ISSN	Citations
10061	0302-9743	0
PageRank	References	Authors
0.34	0	4

Authors (4 rows)

Cited by (0 rows)

References (0 rows)

Name	Order	Citations	PageRank
Alba Núñez-Reyes	1	0	0.34
Esaú Villatoro-Tello	2	14	3.06
Gabriela Ramírez-De-La-Rosa	3	10	10.81
Christian Sánchez-Sánchez	4	7	5.65

1