Abstract | ||
---|---|---|
Named Entity Recognition (NER), an information extraction task, is typically applied to spoken documents by cascading a large vocabulary continuous speech recognizer (LVCSR) and a named entity tagger. Recognizing named entities in automatically decoded speech is difficult since LVCSR errors can confuse the tagger. This is especially true of out-of-vocabulary (OOV) words, which are often named entities and always produce transcription errors. In this work, we improve speech NER by including features indicative of OOVs based on a OOV detector, allowing for the identification of regions of speech containing named entities, even if they are incorrectly transcribed. We construct a new speech NER data set and demonstrate significant improvements for this task. |
Year | Venue | Keywords |
---|---|---|
2011 | 12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5 | Named Entity Recognition, OOV Detection |
Field | DocType | Citations |
Computer science,Named entity,Speech recognition,Information extraction,Named-entity recognition,Vocabulary | Conference | 9 |
PageRank | References | Authors |
0.59 | 16 | 3 |
Name | Order | Citations | PageRank |
---|---|---|---|
Carolina Parada | 1 | 242 | 13.11 |
Mark Dredze | 2 | 3092 | 176.22 |
Frederick Jelinek | 3 | 139 | 23.22 |