Title
A Word Position-Related Lda Model
Abstract
LDA (Latent Dirichlet Allocation) proposed by Blei is a generative probabilistic model of a corpus, where documents are represented as random mixtures over latent topics, and each topic is characterized by a distribution over words, but not the attributes of word positions of every document in the corpus. In this paper, a Word Position-Related LDA Model is proposed taking into account the attributes of word positions of every document in the corpus, where each word is characterized by a distribution over word positions. At the same time, the precision of the topic-word's interpretability is improved by integrating the distribution of the word-position and the appropriate word degree, taking into account the different word degree in the different word positions. Finally, a new method, a size-aware word intrusion method is proposed to improve the ability of the topic-word's interpretability. Experimental results on the NIPS corpus show that the Word Position-Related LDA Model can improve the precision of the topic-word's interpretability. And the average improvement of the precision in the topic-word's interpretability is about 9.67%. Also, the size-aware word intrusion method can interpret the topic-word's semantic information more comprehensively and more effectively through comparing the different experimental data.
Year
DOI
Venue
2011
10.1142/S0218001411008890
INTERNATIONAL JOURNAL OF PATTERN RECOGNITION AND ARTIFICIAL INTELLIGENCE
Keywords
Field
DocType
LDA, probabilistic topic models, word position, word degree, word intrusion
Latent Dirichlet allocation,Experimental data,Computer science,Artificial intelligence,Natural language processing,Interpretability,Intrusion,tf–idf,Pattern recognition,Word error rate,Statistical model,Generative grammar,Machine learning
Journal
Volume
Issue
ISSN
25
6
0218-0014
Citations 
PageRank 
References 
3
0.36
7
Authors
4
Name
Order
Citations
PageRank
Lidong Zhai1235.97
Zhaoyun Ding2295.90
Yan Jia35610.52
Bin Zhou434130.99