Title
Sentence-Level Novelty Detection in English and Malay
Abstract
Novelty detection (ND) is a process for identifying information from an incoming stream of documents. Although there are many studies of ND on English language documents, however, to the best of our knowledge, none has been reported on Malay documents. This issue is important because there are many documents with a mixture of both English and Malay languages. This paper examines multilingual sentence-level ND in English and Malay documents using TREC 2003 and TREC 2004 Novelty Track data. We describe the text processing for multilingual ND, which consists of language translation, stop words removal, automatic stemming, and novel sentence detection. We compare the results for sentence-level ND on English and Malay documents and find that the results are fairly similar. Therefore, after preprocessing is performed on Malay documents, our ND algorithm appears to be robust in detecting novel sentences, and can possibly be extended to other alphabet-based languages.
Year
DOI
Venue
2009
10.1007/978-3-642-01307-2_7
PAKDD
Keywords
Field
DocType
multilingual sentence-level nd,novel sentence,sentence-level novelty detection,language translation,alphabet-based language,sentence-level nd,nd algorithm,malay document,english language document,malay language,multilingual nd,english language
Novelty detection,Language translation,Computer science,Malay,Preprocessor,Natural language processing,Artificial intelligence,Novelty,Sentence,Stop words,Text processing
Conference
Volume
ISSN
Citations 
5476
0302-9743
18
PageRank 
References 
Authors
0.91
9
3
Name
Order
Citations
PageRank
Agus T. Kwee1423.12
Flora S. Tsai235223.96
Wenyin Tang3907.19