Using information from the target language to improve crosslingual text classification - Citegraph

Paper Info

Title
Using information from the target language to improve crosslingual text classification

Abstract
Crosslingual text classification consists of exploiting labeled documents in a source language to classify documents in a different target language. In addition to the evident translation problem, this task also faces some difficulties caused by the cultural discrepancies manifested in both languages by means of different topic distributions. Such discrepancies make the classifier unreliable for the categorization task. In order to tackle this problem we propose to improve the classification performance by using information embedded in the own target dataset. The central idea of the proposed approach is that similar documents must belong to the same category. Therefore, it classifies the documents by considering not only their own content but also information about the assigned category to other similar documents from the same target dataset. Experimental results using three different languages evidence the appropriateness of the proposed approach.

Year	DOI	Venue
2010	10.1007/978-3-642-14770-8_34	IceTAL
Keywords	Field	DocType
different target language,assigned category,crosslingual text classification,different topic distribution,similar document,categorization task,own target dataset,target dataset,different languages evidence	Categorization,Information retrieval,Computer science,Support vector machine,Classifier (linguistics),Text categorization	Conference
Volume	ISSN	ISBN
6233	0302-9743	3-642-14769-0
Citations	PageRank	References
3	0.41	12
Authors
5

Authors (5 rows)

Cited by (3 rows)

References (12 rows)

Name	Order	Citations	PageRank
Gabriela Ramírez-De-La-Rosa	1	10	10.81
Manuel Montes-Y-Gómez	2	638	83.97
Luis Villaseñor-Pineda	3	403	53.74
David Pinto-Avendaño	4	8	1.16
Thamar Solorio	5	432	55.65

1