Building a Cross-lingual Relatedness Thesaurus using a Graph Similarity Measure - Citegraph

Paper Info

Title
Building a Cross-lingual Relatedness Thesaurus using a Graph Similarity Measure

Abstract
The Internet is an ever growing source of information stored in documents of different languages. Hence, cross-lingual resources are needed for more and more NLP applications. This paper presents (i) a graph-based method for creating one such resource and (ii) a resource created using the method, a cross-lingual relatedness thesaurus. Given a word in one language, the thesaurus suggests words in a second language that are semantically related. The method requires two monolingual corpora and a basic dictionary. Our general approach is to build two monolingual word graphs, with nodes representing words and edges representing linguistic relations between words. A bilingual dictionary containing basic vocabulary provides seed translations relating nodes from both graphs. We then use an inter-graph node-similarity algorithm to discover relate d words. Evaluation with three human judges revealed that 49% of the English and 57% of the German words discovered by our method are semantically related to the target words. We publish two resources in conjunction with this paper. First, noun coordinations ext racted from the German and English Wikipedias. Second, the cross-lingual relatedness thesaurus which can be used in experiments involving interactive cross-lingual query expansion.

Year	Venue	Keywords
2010	LREC 2010 - SEVENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION	query expansion,noun
Field	DocType	Citations
Semantic similarity,Publication,Bilingual dictionary,Query expansion,Information retrieval,Computer science,Noun,Artificial intelligence,Natural language processing,Vocabulary,German,The Internet	Conference	4
PageRank	References	Authors
0.47	10	5

Authors (5 rows)

Cited by (4 rows)

References (10 rows)

Name	Order	Citations	PageRank
Lukas Michelbacher	1	30	2.28
Florian Laws	2	145	8.06
Beate Dorow	3	186	11.94
Ulrich Heid	4	190	40.48
Hinrich Schütze	5	2113	362.21

1