Title
Short Text Clustering with a Deep Multi-embedded Self-supervised Model
Abstract
Short text clustering is challenging in the field of Natural Language Processing (NLP) since it is hard to learn the discriminative representations with limited information. In this paper, fused multi-embedded features are employed to enhance the representations of short texts. Then, a denoising autoencoder with an attention layer is adopted to extract low-dimensional features from the multi-embeddings against the disturbance of noisy texts. Furthermore, we propose a novel distribution estimation with jointly utilizing soft cluster assignment and the prior target distribution transition to better fine-tune the encoder. Combining the above work, we propose a deep multi-embedded self-supervised model(DMESSM) for short text clustering. We compare our DMESSM with the state-of-the-art methods in head-to-head comparisons on benchmark datasets, which indicates that our method outperforms them.
Year
DOI
Venue
2021
10.1007/978-3-030-86383-8_12
ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING, ICANN 2021, PT V
Keywords
DocType
Volume
Short text clustering, Autoencoder, Self-supervised clustering, Attention, Distribution estimation
Conference
12895
ISSN
Citations 
PageRank 
0302-9743
0
0.34
References 
Authors
0
5
Name
Order
Citations
PageRank
Kai Zhang100.68
Zheng Lian2128.33
Jiangmeng Li301.69
Haichang Li402.03
Xiaohui Hu5178.10