Abstract | ||
---|---|---|
The categorization and novelty mining of chronologically ordered documents is an important data mining problem. This paper focuses on the entire process of Chinese novelty mining, from preprocessing and categorization to the actual detection of novel information, which has rarely been studied. First, preprocessing techniques for detecting novel Chinese text are discussed and compared. Next, we investigate the categorization and novelty mining performance between English and Chinese sentences and also discuss the novelty mining performance based on the retrieval results. Moreover, we propose new novelty mining evaluation measures, Novelty-Precision, Novelty-Recall, Novelty-F Score, and Sensitivity, which measures the sensitivity of the novelty mining system to the incorrectly classified sentences. The results indicate that Chinese novelty mining at the sentence level is similar to English if the sentences are perfectly categorized. Using our new evaluation measures of Novelty-Precision, Novelty-Recall, Novelty-F Score, and Sensitivity, we can more fairly assess how the performance of novelty mining is influenced by the retrieval results. |
Year | DOI | Venue |
---|---|---|
2011 | 10.1007/978-3-642-20847-8_24 | PAKDD (2) |
Keywords | Field | DocType |
novelty mining performance,novelty mining system,novel chinese text,important data mining problem,chinese sentence,novelty mining,retrieval result,chinese novelty mining,chinese categorization,new novelty mining evaluation,novelty-f score | Data mining,Categorization,Novelty detection,Text mining,Computer science,Preprocessor,Artificial intelligence,Novelty,Sentence,Machine learning | Conference |
Volume | ISSN | Citations |
6635 | 0302-9743 | 1 |
PageRank | References | Authors |
0.36 | 14 | 2 |
Name | Order | Citations | PageRank |
---|---|---|---|
Flora S. Tsai | 1 | 352 | 23.96 |
Yi Zhang | 2 | 48 | 3.51 |