Interlingual Indexing across Different Languages - Citegraph

Paper Info

Title
Interlingual Indexing across Different Languages

Abstract
We present two methods for automatic indexing, which are based on an interlingual layer of content description. In the first approach, we acquire indexing patterns from English documents by statistically relating interlingual representations of English documents (based on text token bigrams) to their associated index terms. Given such indexing patterns, we then induce the associated index terms when the same interlingual representations turn up for documents of other natural languages (viz. German and Portuguese). Hence, we 'learn' from the past English indexing experience and transfer it in an unsupervised way to non-English languages, without ever having seen any concrete indexing data for languages other than English. In the second approach, documents from the three different languages are heuristically matched with a sophisticated medical thesaurus (the English MESH) after both, documents and the thesaurus, have been transformed into the interlingua. The combination of the statistical and heuristical method in a fully automated indexing system achieves 56% to 68% of the human indexing performance for each of the three languages.

Year	Venue	Keywords
2004	RIAO	indexation,natural language,indexing terms,english language
Field	DocType	Citations
Information retrieval,Computer science,Interlingua,Portuguese,Search engine indexing,Natural language,Natural language processing,Bigram,Artificial intelligence,Automatic indexing,Security token,German	Conference	7
PageRank	References	Authors
0.62	16	5

Authors (5 rows)

Cited by (7 rows)

References (16 rows)

Name	Order	Citations	PageRank
Kornél Markó	1	103	10.17
Udo Hahn	2	32	4.80
Stefan Schulz	3	1092	127.03
Philipp Daumke	4	34	7.34
Percy Nohama	5	56	13.12

1