Title
Multilingual novelty detection
Abstract
Novelty detection aims at reducing redundant information from a chronologically ordered list of documents or sentences. Other studies of novelty detection have been conducted on the English language, but few papers have addressed the problem of multilingual novelty detection. Likewise, research in multilingual information retrieval have rarely been applied to novelty detection. This paper attempts to bridge the two disciplines by first describing the preprocessing steps for English, Malay and Chinese, then applying document and sentence-level novelty detection for the three languages on APWSJ and TREC 2004 Novelty Track data. Experiments on sentence-level novelty detection show similar results for all three languages, which indicates that our algorithm is suitable for multilingual novelty detection at the sentence level. However, results for document-level novelty detection show a disparity across the different languages, with English and Malay outperforming Chinese. After applying sentence-level novelty detection to detect novel documents, we observe substantial improvements on all three languages. This demonstrates that segmenting documents into sentences improves document-level novelty detection in multiple languages, and has practical benefits for a real-time multilingual novelty detection system.
Year
DOI
Venue
2011
10.1016/j.eswa.2010.07.016
Expert Syst. Appl.
Keywords
Field
DocType
stemming,multilingual novelty detection,novelty detection,redundant information,document-level novelty detection,document-level novelty,english language,chinese,real-time multilingual novelty detection,sentence-level novelty detection,multilingual information retrieval,pos tagging,malay,multilingual,sentence-level novelty,real time
Novelty detection,English language,Computer science,Malay,Preprocessor,Natural language processing,Artificial intelligence,Novelty,Sentence
Journal
Volume
Issue
ISSN
38
1
Expert Systems With Applications
Citations 
PageRank 
References 
3
0.40
20
Authors
4
Name
Order
Citations
PageRank
Flora S. Tsai135223.96
Yi Zhang2483.51
Agus T. Kwee3423.12
Wenyin Tang4907.19