Enlarging the Croatian WordNet with WN-Toolkit and Cro-Deriv. - Citegraph

Paper Info

Title
Enlarging the Croatian WordNet with WN-Toolkit and Cro-Deriv.

Abstract
Wordnet is a standard semantic resource for several Natural Language Processing tasks and it is available for an increasing number of languages. The Croatian Wordnet (CroWN) was a relatively small resource with 10.026 synsets and 31.367 synset-variant pairs covering only 45.91% of the so-called Core WordNet. Comparing these figures with the size of the Princeton WordNet for English version 3.0, that has 117,659 synsets and 206,975 synset-variant pairs, it is clear that the CroWN should be expanded. First experiments for the expansion of the CroWN were performed using the WN-Toolkit, a set of Python programs for wordnet creation and expansion using dictionary, Babelnet and parallel-corpora based strategies. The WN-Toolkit was previously successfully applied to other languages as Spanish, Catalan and Galician. After this first expansion, CroWN reached 70.63% of the core wordnet. In the second step we used CroDeriv, a derivational database for Croatian and the manual creation of 1,457 synset-variant pairs until reaching 100% of the Core WordNet. After second step was completed, CroWN reached 23,137 synsets and 47,931 synset-lemma pairs.

Year	Venue	Field
2015	RANLP	Catalan,Information retrieval,Computer science,Natural language processing,Artificial intelligence,Croatian,WordNet,Python (programming language)
DocType	Citations	PageRank
Conference	0	0.34
References	Authors
4	3

Authors (3 rows)

Cited by (0 rows)

References (4 rows)

Name	Order	Citations	PageRank
Antoni Oliver	1	111	17.28
Kresimir Sojat	2	10	6.35
Matea Srebacic	3	9	3.24

1