Title
Multilingual sentence categorization and novelty mining
Abstract
A challenge for sentence categorization and novelty mining is to detect not only when text is relevant to the user's information need, but also when it contains something new which the user has not seen before. It involves two tasks that need to be solved. The first is identifying relevant sentences (categorization) and the second is identifying new information from those relevant sentences (novelty mining). Many previous studies of relevant sentence retrieval and novelty mining have been conducted on the English language, but few papers have addressed the problem of multilingual sentence categorization and novelty mining. This is an important issue in global business environments, where mining knowledge from text in a single language is not sufficient. In this paper, we perform the first task by categorizing Malay and Chinese sentences, then comparing their performances with that of English. Thereafter, we conduct novelty mining to identify the sentences with new information. Experimental results on TREC 2004 Novelty Track data show similar categorization performance on Malay and English sentences, which greatly outperform Chinese. In the second task, it is observed that we can achieve similar novelty mining results for all three languages, which indicates that our algorithm is suitable for novelty mining of multilingual sentences. In addition, after benchmarking our results with novelty mining without categorization, it is learnt that categorization is necessary for the successful performance of novelty mining.
Year
DOI
Venue
2011
10.1016/j.ipm.2010.02.003
Inf. Process. Manage.
Keywords
Field
DocType
mining knowledge,sentence retrieval,new information,chinese,sentence categorization,similar novelty mining result,relevant sentence retrieval,multilingual sentence categorization,novelty mining,chinese sentence,malay,multilingual categorization,similar categorization performance,relevant sentence,information need,english language
Categorization,Text mining,Information needs,English language,Information retrieval,Malay,Computer science,Artificial intelligence,Natural language processing,Novelty,Sentence,Benchmarking
Journal
Volume
Issue
ISSN
47
5
Information Processing and Management
Citations 
PageRank 
References 
14
0.71
22
Authors
3
Name
Order
Citations
PageRank
Yi Zhang1483.51
Flora S. Tsai235223.96
Agus Trisnajaya Kwee3303.47