Abstract | ||
---|---|---|
We propose a new algorithm for topic modeling, Vec2Topic, that identifies the main topics in a corpus using semantic information captured via high-dimensional distributed word embeddings. Our technique is unsupervised and generates a list of topics ranked with respect to importance. We find that it works better than existing topic modeling techniques such as Latent Dirichlet Allocation for identifying key topics in user-generated content, such as emails, chats, etc., where topics are diffused across the corpus. We also find that Vec2Topic works equally well for non-user generated content, such as papers, reports, etc., and for small corpora such as a single-document. |
Year | Venue | Field |
---|---|---|
2016 | arXiv: Computation and Language | Latent Dirichlet allocation,Ranking,Information retrieval,Computer science,Semantic information,Artificial intelligence,Natural language processing,Topic model,Machine learning |
DocType | Volume | Citations |
Journal | abs/1603.04747 | 1 |
PageRank | References | Authors |
0.35 | 14 | 3 |
Name | Order | Citations | PageRank |
---|---|---|---|
Ramandeep S. Randhawa | 1 | 135 | 11.65 |
Parag Jain | 2 | 9 | 4.53 |
Gagan Madan | 3 | 1 | 1.03 |