Title
Linguistic resources for meeting speech recognition
Abstract
This paper describes efforts by the University of Pennsylvania's Linguistic Data Consortium to create and distribute shared linguistic resources – including data, annotations, tools and infrastructure – to support the Rich Transcription 2005 Spring Meeting Recognition Evaluation. In addition to distributing large volumes of training data, LDC produced reference transcripts for the RT-05S conference room evaluation corpus, which represents a variety of subjects, scenarios and recording conditions. Careful verbatim reference transcripts including rich markup were created for all two hours of data. One hour was also selected for a contrastive study using a quick transcription methodology. We review the two methodologies and discuss qualitative differences in the resulting transcripts. Finally, we describe infrastructure development including transcription tools to support our efforts.
Year
DOI
Venue
2005
10.1007/11677482_33
international conference on machine learning
Keywords
Field
DocType
RT-05S conference room evaluation,quick transcription methodology,training data,transcription tool,Linguistic Data Consortium,careful verbatim reference transcript,infrastructure development,Linguistic resource,reference transcript,meeting speech recognition,Recognition Evaluation,Rich Transcription
Training set,Software tool,Linguistic Data Consortium,Annotation,Shared memory,Computer science,Conference room,Speech recognition,Shared resource,Linguistics,Markup language
Conference
Volume
ISSN
ISBN
3869
0302-9743
3-540-32549-2
Citations 
PageRank 
References 
1
0.61
3
Authors
2
Name
Order
Citations
PageRank
Meghan Lammie Glenn1174.77
Stephanie Strassel251258.41