Abstract | ||
---|---|---|
This paper describes the first steps towards the creation of a Bulgarian-Croatian comparable corpus. Its base are two newspaper sub- corpora from larger reference corpora of Bulgarian and Croatian. In the beginning we rely on more extralinguistically-oriented, but methodologically cleaner parameters of similarity like: specific topics, pre-defined time span and data size. The idea of 'light' and 'hard' comparable corpora is introduced. At this stage we aim at producing a 'light' bilingual comparable corpus. The algorithm for identifying lexical similarity and aligning linguistic units is presented, and the initial experiments are outlined. |
Year | Venue | Field |
---|---|---|
2004 | LREC | Lexical similarity,Bulgarian,Computer science,Speech recognition,Newspaper,Natural language processing,Artificial intelligence,Corpus linguistics,Croatian,Linguistics |
DocType | Citations | PageRank |
Conference | 5 | 0.49 |
References | Authors | |
2 | 4 |
Name | Order | Citations | PageRank |
---|---|---|---|
Bozo Bekavac | 1 | 13 | 4.26 |
Petya Osenova | 2 | 360 | 46.00 |
Kiril Simov | 3 | 139 | 29.75 |
Marko Tadić | 4 | 80 | 15.61 |