Learning indexing patterns from one language for the benefit of others - Citegraph

Paper Info

Title
Learning indexing patterns from one language for the benefit of others

Abstract
Using language technology for text analysis and light-weight ontologies as a content-mediating level, we acquire indexing patterns from vast amounts of indexing data for English-language medical documents. This is achieved by statistically relating interlingual representations of these documents (based on text token bigrams) to their associated index terms. From these 'English' indexing patterns, we then induce the associated index terms for German and Portuguese documents when their interlingual representations match those of English documents. Thus, we learn from past English indexing experience and transfer it in an unsupervised way to non-English texts, without ever having seen concrete indexing data for languages other than English.

Year	Venue	Keywords
2004	AAAI	associated index term,text analysis,concrete indexing data,past english indexing experience,english document,interlingual representation,text token bigrams,indexing pattern,english-language medical document,indexing data,indexation,english language,language technology,indexing terms
Field	DocType	ISBN
Ontology (information science),Medical documents,Computer science,Portuguese,Search engine indexing,Natural language processing,Bigram,Artificial intelligence,Security token,Language technology,German	Conference	0-262-51183-5
Citations	PageRank	References
1	0.36	6
Authors
3

Authors (3 rows)

Cited by (1 rows)

References (6 rows)

Name	Order	Citations	PageRank
Udo Hahn	1	88	11.14
Kornél Markó	2	38	3.58
Stefan Schulz	3	29	5.15

1