Title
Cluster Correction on Polysemy and Synonymy
Abstract
Document clustering (or text clustering) is the application of cluster analysis to textual documents. It has applications in automatic document organization, topic extraction and fast information retrieval or filtering. At the same time, there are still many challenges, for example the accuracy of clustering needs to be improved. In this regard, the process of cluster correction becomes the object of analysis. In this paper, we focus on the polysemy and synonymy issue in clustering process. Polysemy represents the ambiguity of an individual word or phrase that can be used (in different contexts) to express two or more different meanings. However, synonymy is the semantic relation that holds between two or more words that can (in a given context) express the same meaning. These two conditions will affect our results of clustering. In order that, we use bag of words model to distinguish contexts of the same words and word2vec to re-cluster word with the similar meaning. Cosine similarity is also use to measure of similarity between two nonzero vectors in these two model.
Year
DOI
Venue
2017
10.1109/WISA.2017.45
2017 14th Web Information Systems and Applications Conference (WISA)
Keywords
Field
DocType
cluster correction,polysemy,synonymy,cosine similarity,bag of words,word2vec
Bag-of-words model,Cosine similarity,Document clustering,Computer science,Phrase,Artificial intelligence,Natural language processing,Word2vec,Cluster analysis,Semantics,Polysemy
Conference
ISBN
Citations 
PageRank 
978-1-5386-4807-0
0
0.34
References 
Authors
0
4
Name
Order
Citations
PageRank
Zemin Qin121.37
Hao Lian293.93
Tieke He35815.85
Bin Luo46621.04