Title
Topic Modeling for Short Texts with Co-occurrence Frequency-Based Expansion
Abstract
Short texts are everywhere on the Web, including messages in social media, status messages, etc, and extracting semantically meaningful topics from these collections is an important and difficult task. Topic modeling methods, such as Latent Dirichlet Allocation, were designed for this purpose. However, discovering high quality topics in short text collections is a challenging task. This is because most topic modeling methods rely on information coming from the word co-occurrence distribution in the collection to extract topics. As in short text this information is scarce, topic modeling methods have difficulties in this scenario, and different strategies to tackle this problem have been proposed in the literature. In this direction, this paper introduces a method for topic modeling of short texts that creates pseudo-documents representations from the original documents. The method is simple, effective, and considers word co-occurrence to expand documents, which can be given as input to any topic modeling algorithm. Experiments were run in four datasets and compared against state-of-the-art methods for extracting topics from short text. Results of coherence, NPMI and clustering metrics showed to be statistically significantly better than the baselines in the majority of cases.
Year
DOI
Venue
2016
10.1109/BRACIS.2016.058
2016 5th Brazilian Conference on Intelligent Systems (BRACIS)
Keywords
Field
DocType
Topic modeling,short text,text expansion
Latent Dirichlet allocation,Social media,Information retrieval,Computer science,Baseline (configuration management),Co-occurrence,Coherence (physics),Natural language processing,Artificial intelligence,Topic model,Cluster analysis
Conference
ISBN
Citations 
PageRank 
978-1-5090-3567-0
2
0.44
References 
Authors
7
5
Name
Order
Citations
PageRank
Gabriel Pedrosa1161.04
Marcelo Pita2171.82
Paulo Viana Bicalho3191.44
Anísio Lacerda417216.18
Gisele L. Pappa534736.97