Abstract | ||
---|---|---|
The rapid development of social media encourages people to share their opinions and feelings on the Internet. Every day, a large number of short text comments are generated through Twitter, microblogging, WeChat, etc., and there is high commercial and social value in extracting useful information from these short texts. At present, most studies have focused on extracting text key words. For example, the LDA topic model has good performance with long texts, but it loses effectiveness with short texts because of the noise and sparsity problems. In this paper, we attempt to use Word2Vec and Doc2Vec to improve short-text key word extraction. We first added the method of the collaborative training of word vectors and paragraph vectors and then used the TextRank model's clustering nodes. We adjusted the weights of the key words that were generated by computing the jump probability between nodes and then obtained the node-weighted score, and eventually sorted the generated key words. The experimental results show that the improved method has good performance on the datasets. |
Year | DOI | Venue |
---|---|---|
2019 | 10.3906/elk-1806-38 | TURKISH JOURNAL OF ELECTRICAL ENGINEERING AND COMPUTER SCIENCES |
Keywords | DocType | Volume |
Key word extraction,short text,word2vec,doc2vec,textrank | Journal | 27 |
Issue | ISSN | Citations |
3.0 | 1300-0632 | 1 |
PageRank | References | Authors |
0.40 | 0 | 5 |
Name | Order | Citations | PageRank |
---|---|---|---|
Jun Li | 1 | 1 | 0.73 |
Guimin Huang | 2 | 6 | 9.26 |
Chunli Fan | 3 | 1 | 0.40 |
Zhenglin Sun | 4 | 1 | 0.40 |
Hongtao Zhu | 5 | 1 | 0.40 |