Title
Interlingual Indexing across Different Languages
Abstract
We present two methods for automatic indexing, which are based on an interlingual layer of content description. In the first approach, we acquire indexing patterns from English documents by statistically relating interlingual representations of English documents (based on text token bigrams) to their associated index terms. Given such indexing patterns, we then induce the associated index terms when the same interlingual representations turn up for documents of other natural languages (viz. German and Portuguese). Hence, we 'learn' from the past English indexing experience and transfer it in an unsupervised way to non-English languages, without ever having seen any concrete indexing data for languages other than English. In the second approach, documents from the three different languages are heuristically matched with a sophisticated medical thesaurus (the English MESH) after both, documents and the thesaurus, have been transformed into the interlingua. The combination of the statistical and heuristical method in a fully automated indexing system achieves 56% to 68% of the human indexing performance for each of the three languages.
Year
Venue
Keywords
2004
RIAO
indexation,natural language,indexing terms,english language
Field
DocType
Citations 
Information retrieval,Computer science,Interlingua,Portuguese,Search engine indexing,Natural language,Natural language processing,Bigram,Artificial intelligence,Automatic indexing,Security token,German
Conference
7
PageRank 
References 
Authors
0.62
16
5
Name
Order
Citations
PageRank
Kornél Markó110310.17
Udo Hahn2324.80
Stefan Schulz31092127.03
Philipp Daumke4347.34
Percy Nohama55613.12