Topic Discovery For Short Texts Using Word Embeddings - Citegraph

Paper Info

Title
Topic Discovery For Short Texts Using Word Embeddings

Abstract
Discovering topics in short texts, such as news titles and tweets, has become an important task for many content analysis applications. However, due to the lack of rich context information in short texts, the performance of conventional topic models on short texts is usually unsatisfying. In this paper, we propose a novel topic model for short text corpus using word embeddings. Continuous space word embeddings, which is proven effective at capturing regularities in language, is incorporated into our model to provide additional semantics. Thus we model each short document as a Gaussian topic over word embeddings in the vector space. In addition, considering that background words in a short text are usually not semantically related, we introduce a discrete background mode over word types to complement the continuous Gaussian topics. We evaluate our model on news titles from data sources like abcnews, showing that our model is able to extract more coherent topics from short texts compared with the baseline methods and learn better topic representation for each short document.

Year	DOI	Venue
2016	10.1109/ICDM.2016.33	2016 IEEE 16TH INTERNATIONAL CONFERENCE ON DATA MINING (ICDM)
Keywords	Field	DocType
short texts, topic model, word embeddings	Content analysis,Information retrieval,Computer science,Text corpus,Gaussian,Encyclopedia,Artificial intelligence,Natural language processing,Topic model,Semantics,Electronic publishing,The Internet	Conference
ISSN	Citations	PageRank
1550-4786	2	0.35
References	Authors
0	6

Authors (6 rows)

Cited by (2 rows)

References (0 rows)

Name	Order	Citations	PageRank
Guangxu Xun	1	109	11.89
Vishrawas Gopalakrishnan	2	32	6.81
Fenglong Ma	3	374	33.08
yaliang li	4	629	50.87
Jing Gao	5	2723	131.05
Aidong Zhang	6	2970	405.63

1