Abstract | ||
---|---|---|
Novelty detection aims at identifying novel information from an incoming stream of documents. In this paper, we propose a new framework for document-level novelty detection using document-to-sentence (D2S) annotations and discuss the applicability of this method. D2S first segments a document into sentences, determines the novelty of each sentence, then computes the document-level novelty score based on a fixed threshold. Experimental results on APWSJ data show that D2S outperforms standard document-level novelty detection in terms of redundancy-precision (RP) and redundancy-recall (RR). We applied D2S on the document-level data from the TREC 2004 and TREC 2003 Novelty Track and find that D2S is useful in detecting novel information in data with a high percentage of novel documents. However, D2S shows a strong capability to detect redundant information regardless of the percentage of novel documents. D2S has been successfully integrated in a real-world novelty detection system. |
Year | DOI | Venue |
---|---|---|
2011 | 10.1007/s10115-010-0372-2 | Knowl. Inf. Syst. |
Keywords | Field | DocType |
novelty detection,document-level novelty score,redundant information,document-level data,standard document-level novelty detection,real-world novelty detection system,apwsj data,document-level novelty detection,novelty detection · redundancy · sentence segmentation · document novelty · novelty dataset · text mining,document-to-sentence framework,novel document,novel information | Data mining,Novelty detection,Text mining,Sentence segmentation,Pattern recognition,Computer science,Redundancy (engineering),Artificial intelligence,Novelty,Sentence | Journal |
Volume | Issue | ISSN |
29 | 2 | 0219-3116 |
Citations | PageRank | References |
15 | 0.68 | 20 |
Authors | ||
2 |
Name | Order | Citations | PageRank |
---|---|---|---|
Flora S. Tsai | 1 | 352 | 23.96 |
Yi Zhang | 2 | 48 | 3.51 |