Title
Advanced learning algorithms for cross-language patent retrieval and classification
Abstract
We study several machine learning algorithms for cross-language patent retrieval and classification. In comparison with most of other studies involving machine learning for cross-language information retrieval, which basically used learning techniques for monolingual sub-tasks, our learning algorithms exploit the bilingual training documents and learn a semantic representation from them. We study Japanese-English cross-language patent retrieval using Kernel Canonical Correlation Analysis (KCCA), a method of correlating linear relationships between two variables in kernel defined feature spaces. The results are quite encouraging and are significantly better than those obtained by other state of the art methods. We also investigate learning algorithms for cross-language document classification. The learning algorithm are based on KCCA and Support Vector Machines (SVM). In particular, we study two ways of combining the KCCA and SVM and found that one particular combination called SVM_2k achieved better results than other learning algorithms for either bilingual or monolingual test documents.
Year
DOI
Venue
2007
10.1016/j.ipm.2006.11.005
Inf. Process. Manage.
Keywords
Field
DocType
cross-language information retrieval,bilingual training document,cross-language patent retrieval,advanced learning algorithm,learning algorithm,particular combination,monolingual test document,better result,machine learning,cross-language document classification,japanese-english cross-language patent retrieval,monolingual sub-tasks,feature space,canonical correlation,support vector machine
Instance-based learning,Semi-supervised learning,Active learning (machine learning),Computer science,Artificial intelligence,Natural language processing,Computational learning theory,Document classification,Kernel (linear algebra),Online machine learning,Information retrieval,Support vector machine,Algorithm,Machine learning
Journal
Volume
Issue
ISSN
43
5
Information Processing and Management
Citations 
PageRank 
References 
31
1.11
16
Authors
2
Name
Order
Citations
PageRank
Yaoyong Li139326.55
John Shawe-Taylor2118791518.73