Cross language text categorization by acquiring multilingual domain models from comparable corpora - Citegraph

Paper Info

Title
Cross language text categorization by acquiring multilingual domain models from comparable corpora

Abstract
In a multilingual scenario, the classical monolingual text categorization problem can be reformulated as a cross language TC task, in which we have to cope with two or more languages (e.g. English and Italian). In this setting, the system is trained using labeled examples in a source language (e.g. English), and it classifies documents in a different target language (e.g. Italian). In this paper we propose a novel approach to solve the cross language text categorization problem based on acquiring Multilingual Domain Models from comparable corpora in a totally unsupervised way and without using any external knowledge source (e.g. bilingual dictionaries). These Multilingual Domain Models are exploited to define a generalized similarity function (i.e. a kernel function) among documents in different languages, which is used inside a Support Vector Machines classification framework. The results show that our approach is a feasible and cheap solution that largely outperforms a baseline.

Year	Venue	Keywords
2005	ParallelText@ACL	classical monolingual text categorization,cross language tc task,comparable corpus,different target language,cross language text categorization,source language,generalized similarity function,multilingual domain models,multilingual domain model,kernel function,external knowledge source,different language,support vector machine,domain model
Field	DocType	Volume
Computer science,Support vector machine,Speech recognition,Artificial intelligence,Language identification,Natural language processing,Text categorization,Domain model,Kernel (statistics)	Conference	W05-08
Citations	PageRank	References
28	1.35	6
Authors
2

Authors (2 rows)

Cited by (28 rows)

References (6 rows)

Name	Order	Citations	PageRank
Alfio Gliozzo	1	257	24.97
Carlo Strapparava	2	2564	230.59

1