Abstract | ||
---|---|---|
Both human and automatic processing of speech require recogniz- ing more than just the words. We describe a state-of-the-art sys- tem for automatic detection of "metadata" (information beyond the words) in both broadcast news and spontaneous telephone conver- sations, developed as part of the DARPA EARS Rich Transcription program. System tasks include sentence boundary detection, filler word detection, and detection/correction of disfluencies. To achieve best performance, we combine information from different types of language models (based on words, part-of-speech classes, and au- tomatically induced classes) with information from a prosodic clas- sifier. The prosodic classifier employs bagging and ensemble ap- proaches to better estimate posterior probabilities. We use confu- sion networks to improve robustness to speech recognition errors. Most recently, we have investigated a maximum entropy approach for the sentence boundary detection task, yielding a gain over our standard HMM approach. We report results for these techniques on the official NIST Rich Transcription metadata tasks. |
Year | Venue | Field |
---|---|---|
2004 | INTERSPEECH | Metadata,Pattern recognition,Computer science,Robustness (computer science),Speech recognition,NIST,Artificial intelligence,Principle of maximum entropy,Classifier (linguistics),Hidden Markov model,Sentence,Language model |
DocType | Citations | PageRank |
Conference | 6 | 0.84 |
References | Authors | |
12 | 7 |
Name | Order | Citations | PageRank |
---|---|---|---|
Elizabeth Shriberg | 1 | 3057 | 325.64 |
Andreas Stolcke | 2 | 6690 | 712.46 |
Dustin Hillard | 3 | 410 | 26.56 |
Mari Ostendorf | 4 | 2462 | 348.75 |
Barbara Peskin | 5 | 176 | 18.45 |
Mary P. Harper | 6 | 609 | 66.92 |
Yang Liu | 7 | 491 | 116.11 |