Title
Topical N-Grams: Phrase and Topic Discovery, with an Application to Information Retrieval
Abstract
Most topic models, such as latent Dirichlet allocation, rely on the bag-of-words assumption. However, word order and phrases are often critical to capturing the meaning of text in many text mining tasks. This paper presents topical n-grams, a topic model that discovers topics as well as topical phrases. The probabilistic model generates words in their textual order by, for each word, first sampling a topic, then sampling its status as a unigram or bigram, and then sampling the word from a topic-specific unigram or bigram distribution. Thus our model can model "white house" as a special meaning phrase in the 'politics' topic, but not in the 'real estate' topic. Successive bigrams form longer phrases. We present experiments showing meaningful phrases and more interpretable topics from the NIPS data and improved information retrieval performance on a TREC collection.
Year
DOI
Venue
2007
10.1109/ICDM.2007.86
ICDM
Keywords
DocType
ISSN
topical n-grams,special meaning phrase,topic discovery,word sampling,interpretable topic,textual order,information retrieval,topic-specific bigram distribution,bigram distribution,phrase/topic discovery,text mining task,topic-specific unigram,data mining,topic-specific unigram distribution,sampling methods,topic model,text analysis,text mining,probability,probabilistic model,word order,latent dirichlet allocation,bag of words
Conference
1550-4786
ISBN
Citations 
PageRank 
978-0-7695-3018-5
197
6.47
References 
Authors
19
3
Search Limit
100197
Name
Order
Citations
PageRank
Xuerui Wang11735123.38
Andrew Kachites McCallumzy2192031588.22
Xing Wei3114160.87