Abstract | ||
---|---|---|
Recordings of read-aloud stories by children in a school setting can be used to provide an assessment of reading skills via automatic speech recognition (ASR). ASR, however, is known to be highly susceptible to background noise. The unusual variety of foreground (breath release, mic pops, etc.) and background (children playing, distinct background talker, wind, etc.) non-speech sounds makes this application particularly challenging. Motivated by the observation on real-world data that close to 50% of the recorded audio comprises purely non-speech activity, we investigate robust approaches to voice activity detection to eliminate non-speech segments to the extent possible prior to ASR. We have exploited energy-based and harmonicity-based features coupled with suitable temporal smoothing constraints in a two-pass noise preprocessing system. A discussion of the voice activity detection performance of the system is presented with reference to the characteristics of the noise types. |
Year | Venue | Field |
---|---|---|
2017 | National Conference on Communications NCC | Background noise,Noise measurement,Voice activity detection,Computer science,Speech recognition,Robustness (computer science),Feature extraction,Smoothing,Preprocessor |
DocType | Citations | PageRank |
Conference | 2 | 0.38 |
References | Authors | |
4 | 3 |
Name | Order | Citations | PageRank |
---|---|---|---|
Ankita Pasad | 1 | 2 | 0.38 |
Kamini Sabu | 2 | 3 | 1.79 |
Rao, Preeti | 3 | 17 | 8.62 |