Abstract | ||
---|---|---|
Practical text classification system should be able to utilize information from both expensive labelled documents and large volumes of cheap unlabelled documents. It should also easily deal with newly input samples. In this paper, we propose a random walks method for text classification, in which the classification problem is formulated as solving the absorption probabilities of Markov random walks on a weighted graph. Then the Laplacian operator for asymmetric graphs is derived and utilized for asymmetric transition matrix. We also develop an induction algorithm for the newly input documents based on the random walks method. Meanwhile, to make full use of text information, a difference measure for text data based on language model and KL-divergence is proposed, as well as a new smoothing technique for it. Finally an algorithm for elimination of ambiguous states is proposed to address the problem of noisy data. Experiments on two well-known data sets: WebKB and 20Newsgroup demonstrate the effectivity of the proposed random walks method. |
Year | DOI | Venue |
---|---|---|
2006 | null | SIAM Proceedings Series |
Keywords | Field | DocType |
null | Random graph,Pattern recognition,Random walk,Computer science,Artificial intelligence,Machine learning | Conference |
Volume | Issue | Citations |
2006 | null | 10 |
PageRank | References | Authors |
1.03 | 12 | 3 |
Name | Order | Citations | PageRank |
---|---|---|---|
Yunpeng Xu | 1 | 21 | 4.44 |
Xing Yi | 2 | 293 | 20.64 |
Changshui Zhang | 3 | 5506 | 323.40 |