Making Monolingual Corpora Comparable: a Case Study of Bulgarian and Croatian. - Citegraph

Paper Info

Title
Making Monolingual Corpora Comparable: a Case Study of Bulgarian and Croatian.

Abstract
This paper describes the first steps towards the creation of a Bulgarian-Croatian comparable corpus. Its base are two newspaper sub- corpora from larger reference corpora of Bulgarian and Croatian. In the beginning we rely on more extralinguistically-oriented, but methodologically cleaner parameters of similarity like: specific topics, pre-defined time span and data size. The idea of 'light' and 'hard' comparable corpora is introduced. At this stage we aim at producing a 'light' bilingual comparable corpus. The algorithm for identifying lexical similarity and aligning linguistic units is presented, and the initial experiments are outlined.

Year	Venue	Field
2004	LREC	Lexical similarity,Bulgarian,Computer science,Speech recognition,Newspaper,Natural language processing,Artificial intelligence,Corpus linguistics,Croatian,Linguistics
DocType	Citations	PageRank
Conference	5	0.49
References	Authors
2	4

Authors (4 rows)

Cited by (5 rows)

References (2 rows)

Name	Order	Citations	PageRank
Bozo Bekavac	1	13	4.26
Petya Osenova	2	360	46.00
Kiril Simov	3	139	29.75
Marko Tadić	4	80	15.61

1