Collecting Language Resources for the Latvian e-Government Machine Translation Platform. - Citegraph

Paper Info

Title
Collecting Language Resources for the Latvian e-Government Machine Translation Platform.

Abstract
This paper describes corpora collection activity for building large machine translation systems for Latvian e-Government platform. We describe requirements for corpora, selection and assessment of data sources, collection of the public corpora and creation of new corpora from miscellaneous sources. Methodology, tools and assessment methods are also presented along with the results achieved, challenges faced and conclusions made. Several approaches to address the data scarceness are discussed. We summarize the volume of obtained corpora and provide quality metrics of MT systems trained on this data. Resulting MT systems for English-Latvian, Latvian-English and Latvian-Russian are integrated in the Latvian e-service portal and are freely available on website HUGO.LV. This paper can serve as a guidance for similar activities initiated in other countries, particularly in the context of European Language Resource Coordination action.

Year	Venue	Keywords
2016	LREC 2016 - TENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION	corpus,parallel texts,machine translation,web crawling,e-Government,public sector information
Field	DocType	Citations
E-Government,Computer science,Machine translation,Speech recognition,Natural language processing,Artificial intelligence,Latvian	Conference	0
PageRank	References	Authors
0.34	0	3

Authors (3 rows)

Cited by (0 rows)

References (0 rows)

Name	Order	Citations	PageRank
Roberts Rozis	1	17	5.90
Andrejs Vasiljevs	2	23	11.74
Raivis Skadins	3	47	10.93

1