Title
Annotations and tools for an activity based Spoken Language Corpus
Abstract
The paper contains a description of the Spoken Language Corpus of Swedish at the Department of Linguistics, Göteborg University (GSLC), and a summary of the various types of analysis and tools that have been developed for work on this corpus. Work on the corpus was started in the late 1970:s. It is incrementally growing and presently consists of 1.3 million words from about 25 different social activities. The corpus was initiated to meet a growing interest in naturalistic spoken language data. It is based on the fact that spoken language varies considerably in different social activities with regard to pronunciation, vocabulary, grammar and communicative functions. The goal of the corpus is to include spoken language from as many social activities as possible to get a more complete understanding of the role of language and communication in human social life. This type of spoken language corpus is still fairly unique even for English, since many spoken language corpora (certainly for Swedish) have been collected for special purposes, like speech recognition, phonetics, dialectal variation or interaction with a computerized dialog system in a very narrow domain, e.g. MapTask (Isard and Carletta 1995), TRAINS (Heeman and Allen 1994), Waxholm (Blomberg et al. 1993). In table 1.1, we compare GSLC to some other corpora. The table provides a comparison of corpora with regard to language, activity types, dialects, type of interaction, total duration, number of recordings, number of transcribed words, the purpose of the corpus, chosen transcription format, age of the participants, medium (audio or video) and some other features. Table 1.1 Comparison of spoken langauge corpora
Year
DOI
Venue
2001
10.3115/1118078.1118079
SIGDIAL Workshop
Keywords
DocType
Citations 
teborg corpus,lund corpus,different social activity,spoken language corpus,human social life,language data,social activity,spoken new zealand english,english corpus,danish bysoc corpus,language corpus
Conference
3
PageRank 
References 
Authors
0.75
1
4
Name
Order
Citations
PageRank
Jens Allwood128426.37
Leif Grönqvist2171.90
ELISABETH AHLSÉN310811.30
Magnus Gunnarsson4412.25