Linguistic Resources for Speech Parsing - Citegraph

Paper Info

Title
Linguistic Resources for Speech Parsing

Abstract
Abstract Wereport on the success of a two-pass approach to annotating metadata, speech effects and syntactic structure in English conversational speech: separately annotating transcribed speech for structural metadata, or structural events, (fillers, speechrepairs ( or edit dysfluencies) and SUs, or syntactic/semantic units) and for syntactic structure (treebanking constituent structure and shallow argument,structure). The two annotations were then combined,into a single representation. Certain alignment issues between the two types of annotation led to the discovery and correction of annotation errors in each, resulting in a more accurate and useful resource. The development,of this corpus was motivated by the need to have both metadata and syntactic structure annotated in order to support synergistic work ,on speech ,parsing and ,structural event detection. Automatic ,detection of these ,speech ,phenomena ,would simultaneously,improve ,parsing accuracy ,and provide a mechanism ,for cleaning up transcriptions for ,downstream ,text processing. Similarly, constraints imposed by text processing systems such as parsers can be used to help improve identification of disfluencies and sentence boundaries. This paper reports on our ,efforts to develop ,a linguistic resource providing both spoken ,metadata ,and syntactic structure information, and describes the resulting corpus of English conversational speech. 1. Motivation for the Creation of this Corpus Inorder to apply,language processing techniques to speech that have been traditionally applied to text, it is important to address ,the inherent differences between these two ,types of inputs. Textual ,input typically involves words ,that are broken ,into sentences ,and clauses using punctuation that are further organized into chunks such as paragraphs, sections, chapters, articles, books, and so on. Although speech is similar in many ways to text (e.g., it is comprised of words that have the same meaning as in text), it also has many differences, some,stemming ,from the fact that people use different modalities/cognitive ,processes ,when processing/producing these inputs/outputs, and others

Year	Venue	Keywords
2006	LREC	english language,natural language,metadata,computational linguistics,parsers,speech
Field	DocType	Citations
Speech corpus,Computer science,Artificial intelligence,Natural language processing,Syntax,Text processing,Metadata,Annotation,Computational linguistics,Speech recognition,Natural language,Parsing,Linguistics	Conference	3
PageRank	References	Authors
0.45	10	8

Authors (8 rows)

Cited by (3 rows)

References (10 rows)

Name	Order	Citations	PageRank
Ann Bies	1	136	20.02
Stephanie Strassel	2	512	58.41
Haejoong Lee	3	105	23.68
Kazuaki Maeda	4	138	34.69
Seth Kulick	5	221	29.66
Yang Liu	6	945	70.67
Mary Harper	7	258	20.54
Matthew Lease	8	1326	84.06

1