Probabilistic Inference of Speech Signals from Phaseless Spectrograms - Citegraph

Paper Info

Title
Probabilistic Inference of Speech Signals from Phaseless Spectrograms

Abstract
Many techniques for complex speech processing such as denoising and deconvolution, time/frequency warping, multiple speaker separation, and multiple microphone analysis operate on sequences of short-time power spectra (spectrograrns), a representation which is often well-suited to these tasks. However, a significant problem with algorithms that manipulate spectrograms is that the output spectrogram does not include a phase component, which is needed to create a time-domain signal that has good perceptual quality. Here we describe a generative model of time-domain speech signals and their spectrograms, and show how an efficient optimizer can be used to find the maximum a posteriori speech signal, given the spectrogram. In contrast to techniques that alternate between estimating the phase and a spectrally-consistent signal, our technique directly infers the speech signal, thus jointly optimizing the phase and a spectrally-consistent signal. We compare our technique with a standard method using signal-to-noise ratios, but we also provide audio files on the web for the purpose of demonstrating the improvement in perceptual quality that our technique offers.

Year	Venue	Keywords
2003	ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 16	speech processing,time domain,time frequency,signal to noise ratio
Field	DocType	Volume
Noise reduction,Speech processing,Computer science,Deconvolution,Artificial intelligence,Image warping,Pattern recognition,Spectrogram,Speech recognition,Maximum a posteriori estimation,Machine learning,Microphone,Generative model	Conference	16
ISSN	Citations	PageRank
1049-5258	9	1.00
References	Authors
5	3

Authors (3 rows)

Cited by (9 rows)

References (5 rows)

Name	Order	Citations	PageRank
Kannan Achan	1	425	35.52
Sam T. Roweis	2	4556	497.42
Brendan J. Frey	3	3637	404.51

1