Title
Chinese categorization and novelty mining
Abstract
The categorization and novelty mining of chronologically ordered documents is an important data mining problem. This paper focuses on the entire process of Chinese novelty mining, from preprocessing and categorization to the actual detection of novel information, which has rarely been studied. First, preprocessing techniques for detecting novel Chinese text are discussed and compared. Next, we investigate the categorization and novelty mining performance between English and Chinese sentences and also discuss the novelty mining performance based on the retrieval results. Moreover, we propose new novelty mining evaluation measures, Novelty-Precision, Novelty-Recall, Novelty-F Score, and Sensitivity, which measures the sensitivity of the novelty mining system to the incorrectly classified sentences. The results indicate that Chinese novelty mining at the sentence level is similar to English if the sentences are perfectly categorized. Using our new evaluation measures of Novelty-Precision, Novelty-Recall, Novelty-F Score, and Sensitivity, we can more fairly assess how the performance of novelty mining is influenced by the retrieval results.
Year
DOI
Venue
2011
10.1007/978-3-642-20847-8_24
PAKDD (2)
Keywords
Field
DocType
novelty mining performance,novelty mining system,novel chinese text,important data mining problem,chinese sentence,novelty mining,retrieval result,chinese novelty mining,chinese categorization,new novelty mining evaluation,novelty-f score
Data mining,Categorization,Novelty detection,Text mining,Computer science,Preprocessor,Artificial intelligence,Novelty,Sentence,Machine learning
Conference
Volume
ISSN
Citations 
6635
0302-9743
1
PageRank 
References 
Authors
0.36
14
2
Name
Order
Citations
PageRank
Flora S. Tsai135223.96
Yi Zhang2483.51