Abstract | ||
---|---|---|
This paper describes efforts by the University of Pennsylvania's Linguistic Data Consortium to create and distribute shared linguistic resources – including data, annotations, tools and infrastructure – to support the Rich Transcription 2005 Spring Meeting Recognition Evaluation. In addition to distributing large volumes of training data, LDC produced reference transcripts for the RT-05S conference room evaluation corpus, which represents a variety of subjects, scenarios and recording conditions. Careful verbatim reference transcripts including rich markup were created for all two hours of data. One hour was also selected for a contrastive study using a quick transcription methodology. We review the two methodologies and discuss qualitative differences in the resulting transcripts. Finally, we describe infrastructure development including transcription tools to support our efforts. |
Year | DOI | Venue |
---|---|---|
2005 | 10.1007/11677482_33 | international conference on machine learning |
Keywords | Field | DocType |
RT-05S conference room evaluation,quick transcription methodology,training data,transcription tool,Linguistic Data Consortium,careful verbatim reference transcript,infrastructure development,Linguistic resource,reference transcript,meeting speech recognition,Recognition Evaluation,Rich Transcription | Training set,Software tool,Linguistic Data Consortium,Annotation,Shared memory,Computer science,Conference room,Speech recognition,Shared resource,Linguistics,Markup language | Conference |
Volume | ISSN | ISBN |
3869 | 0302-9743 | 3-540-32549-2 |
Citations | PageRank | References |
1 | 0.61 | 3 |
Authors | ||
2 |
Name | Order | Citations | PageRank |
---|---|---|---|
Meghan Lammie Glenn | 1 | 17 | 4.77 |
Stephanie Strassel | 2 | 512 | 58.41 |