Title | ||
---|---|---|
Parsing Speech: a Neural Approach to Integrating Lexical and Acoustic-Prosodic Information. |
Abstract | ||
---|---|---|
In conversational speech, the acoustic signal provides cues that help listeners disambiguate difficult parses. For automatically parsing spoken utterances, we introduce a model that integrates transcribed text and acoustic-prosodic features using a convolutional neural network over energy and pitch trajectories coupled with an attention-based recurrent neural network that accepts text and prosodic features. We find that different types of acoustic-prosodic features are individually helpful, and together give statistically significant improvements in parse and disfluency detection F1 scores over a strong text-only baseline. For this study with known sentence boundaries, error analyses show that the main benefit of acoustic-prosodic features is in sentences with disfluencies, attachment decisions are most improved, and transcription errors obscure gains from prosody. |
Year | Venue | Field |
---|---|---|
2018 | NAACL-HLT | Computer science,Artificial intelligence,Natural language processing,Parsing |
DocType | Citations | PageRank |
Conference | 2 | 0.37 |
References | Authors | |
0 | 6 |
Name | Order | Citations | PageRank |
---|---|---|---|
Trang Tran | 1 | 8 | 2.50 |
Shubham Toshniwal | 2 | 19 | 4.12 |
Mohit Bansal | 3 | 871 | 63.19 |
Kevin Gimpel | 4 | 1545 | 79.71 |
Karen Livescu | 5 | 1254 | 71.43 |
Mari Ostendorf | 6 | 2462 | 348.75 |