Abstract | ||
---|---|---|
Short text clustering is challenging in the field of Natural Language Processing (NLP) since it is hard to learn the discriminative representations with limited information. In this paper, fused multi-embedded features are employed to enhance the representations of short texts. Then, a denoising autoencoder with an attention layer is adopted to extract low-dimensional features from the multi-embeddings against the disturbance of noisy texts. Furthermore, we propose a novel distribution estimation with jointly utilizing soft cluster assignment and the prior target distribution transition to better fine-tune the encoder. Combining the above work, we propose a deep multi-embedded self-supervised model(DMESSM) for short text clustering. We compare our DMESSM with the state-of-the-art methods in head-to-head comparisons on benchmark datasets, which indicates that our method outperforms them. |
Year | DOI | Venue |
---|---|---|
2021 | 10.1007/978-3-030-86383-8_12 | ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING, ICANN 2021, PT V |
Keywords | DocType | Volume |
Short text clustering, Autoencoder, Self-supervised clustering, Attention, Distribution estimation | Conference | 12895 |
ISSN | Citations | PageRank |
0302-9743 | 0 | 0.34 |
References | Authors | |
0 | 5 |
Name | Order | Citations | PageRank |
---|---|---|---|
Kai Zhang | 1 | 0 | 0.68 |
Zheng Lian | 2 | 12 | 8.33 |
Jiangmeng Li | 3 | 0 | 1.69 |
Haichang Li | 4 | 0 | 2.03 |
Xiaohui Hu | 5 | 17 | 8.10 |