Title
Compilation, transcription and usage of a reference speech corpus: the case of the Slovene corpus GOS
Abstract
In recent years, building reference speech corpora was an important part of the activities which provided the necessary linguistic infrastructure in many European countries, for languages with many speakers (e.g., French, German, Spanish, Italian) as well as for those with smaller numbers of speakers (e.g., Swedish, Dutch, Czech, Slovak). This paper describes the process of the creation of a reference speech corpus and its distribution to potential users, as it was done in the case of the Slovene corpus GOS. The corpus structure and fieldwork experiences with recording, labelling system, and two levels of transcription (pronunciation-based and standardized) are described, as well as the main characteristics of the corpus interface (web concordancer) and the availability of the original corpus files.
Year
DOI
Venue
2013
10.1007/s10579-013-9216-5
Language Resources and Evaluation
Keywords
Field
DocType
Spoken language,Discourse,Recordings,Transcription conventions,Web concordancer
Speech corpus,Pronunciation,Czech,Computer science,Natural language processing,Corpus linguistics,Artificial intelligence,Slovak,Concordancer,Text corpus,Speech recognition,Linguistics,German
Journal
Volume
Issue
ISSN
47
4
1574-020X
Citations 
PageRank 
References 
1
0.36
3
Authors
5
Name
Order
Citations
PageRank
Darinka Verdonik1164.76
Iztok Kosem211.04
Ana Zwitter Vitez310.70
Simon Krek4244.18
Marko Stabej540.78