Title
Linguistic Resources for Speech Parsing
Abstract
Abstract Wereport on the success of a two-pass approach to annotating metadata, speech effects and syntactic structure in English conversational speech: separately annotating transcribed speech for structural metadata, or structural events, (fillers, speechrepairs ( or edit dysfluencies) and SUs, or syntactic/semantic units) and for syntactic structure (treebanking constituent structure and shallow argument,structure). The two annotations were then combined,into a single representation. Certain alignment issues between the two types of annotation led to the discovery and correction of annotation errors in each, resulting in a more accurate and useful resource. The development,of this corpus was motivated by the need to have both metadata and syntactic structure annotated in order to support synergistic work ,on speech ,parsing and ,structural event detection. Automatic ,detection of these ,speech ,phenomena ,would simultaneously,improve ,parsing accuracy ,and provide a mechanism ,for cleaning up transcriptions for ,downstream ,text processing. Similarly, constraints imposed by text processing systems such as parsers can be used to help improve identification of disfluencies and sentence boundaries. This paper reports on our ,efforts to develop ,a linguistic resource providing both spoken ,metadata ,and syntactic structure information, and describes the resulting corpus of English conversational speech. 1. Motivation for the Creation of this Corpus Inorder to apply,language processing techniques to speech that have been traditionally applied to text, it is important to address ,the inherent differences between these two ,types of inputs. Textual ,input typically involves words ,that are broken ,into sentences ,and clauses using punctuation that are further organized into chunks such as paragraphs, sections, chapters, articles, books, and so on. Although speech is similar in many ways to text (e.g., it is comprised of words that have the same meaning as in text), it also has many differences, some,stemming ,from the fact that people use different modalities/cognitive ,processes ,when processing/producing these inputs/outputs, and others
Year
Venue
Keywords
2006
LREC
english language,natural language,metadata,computational linguistics,parsers,speech
Field
DocType
Citations 
Speech corpus,Computer science,Artificial intelligence,Natural language processing,Syntax,Text processing,Metadata,Annotation,Computational linguistics,Speech recognition,Natural language,Parsing,Linguistics
Conference
3
PageRank 
References 
Authors
0.45
10
8
Name
Order
Citations
PageRank
Ann Bies113620.02
Stephanie Strassel251258.41
Haejoong Lee310523.68
Kazuaki Maeda413834.69
Seth Kulick522129.66
Yang Liu694570.67
Mary Harper725820.54
Matthew Lease8132684.06