Title
On not remembering disfluencies
Abstract
Disfluencies - repetitions and reformulations mid- sentence in normal spontaneous speech - are problematic for both psychological and computational models of speech understanding. Much effort is being applied to finding ways of adapting computational systems to detect and delete disfluencies. The input to such systems is usually an accurate transcription. We present results of an experiment in which human listeners are asked to give verbatim transcriptions of disfluent and fluent utterances. These suggest that listeners are seldom able to identify all the words "deleted" in disfluencies. While all types suffer, identification rates for repetitions are even worse than for other types. We attribute the results to difficulties in recall or coding for recall items which can not be identified with certainty. This inability seems to make human speech recognition more robust than current computational models. 1. BACKGROUND Human listeners are reasonably accurate in transcribing fluent speech but find it difficult to transcribe disfluencies (10). In contrast, automatic speech recognition systems have considerable difficulty spotting and excising disfluencies despite the distinctive acoustic or structural features (3, 6, 11) of these very common phenomena. We report on the human abilities which make disfluencies evanescent. The portion of a disfluent utterance which must be expunged to make fair copy is called the reparandum. Though words in reparanda are processed (4), they may not be correctly identified by normal listeners (8). Part of the problem appears to be due to the disfluent interruption itself. People may depend on subsequent as well as prior context when they recognize words in running spontaneous speech (5, 2). For words in the reparandum, the disfluent interruption truncates the subsequent context before the arrival of information which would normally allow the words to be identified. In a word-level gating experiment in which an utterance is presented starting with the first word and including an additional word on each trial, words in reparanda were so deficient in late-delivered recognition that they proved exceptionally unintelligible (8). The current work examines the evidence that failures of memory as well as failures of perception are involved in the human ability to miss disfluencies. A large-scale verbatim transcription task was designed with two purposes. First, it checked for recognition failures in a more natural task than gating. Second, we test the hypothesis that REPETITION DEAFNESS will make recall even worse for disfluencies which contain repeated words than for those which do not. Repetition Deafness (9) and Blindness (7) are inabilities to distinguish in recall two very similar stimuli witnessed close together in time, particularly in presentations (e.g. rapid list intonation, time-compressed speech) which make perception and encoding difficult. We test two parts of this prediction: first, that the repetition itself creates the deficit, second, that coding and perceptual pressures conspire with repetition to suppress accurate reporting. 2. METHOD
Year
Venue
DocType
1997
EUROSPEECH
Conference
Citations 
PageRank 
References 
1
0.45
3
Authors
2
Name
Order
Citations
PageRank
Robin J. Lickley14514.33
Ellen G Bard29716.22