Title
A probabilistic model for text categorization: based on a single random variable with multiple values
Abstract
Text categorization is the classification of documents with respect to a set of predefined categories. In this paper, we propose a new probabilistic model for text categorization, that is based on a Single random Variable with Multiple Values (SVMV). Compared to previous probabilistic models, our model has the following advantages; 1) it considers within-document term frequencies, 2) considers term weighting for target documents, and 3) is less affected by having insufficient training cases. We verify our model's superiority over the others in the task of categorizing news articles from the "Wall Street Journal".
Year
DOI
Venue
1994
10.3115/974358.974395
ANLP
Keywords
Field
DocType
wall street journal,previous probabilistic model,multiple values,single random variable,new probabilistic model,categorizing news article,within-document term frequency,following advantage,text categorization,multiple value,term weighting,random variable,probabilistic model
Random variable,Weighting,Computer science,Statistical model,Artificial intelligence,Probabilistic logic,Probabilistic relevance model,Text categorization,Machine learning
Conference
Citations 
PageRank 
References 
17
3.74
9
Authors
2
Name
Order
Citations
PageRank
Makoto Iwayama143687.03
Takenobu Tokunaga256082.66