Title
SPEECH CODING WITH AN ANALYSIS-BY-SYNTHESIS SINUSOIDAL MODEL
Abstract
We introduce a general and powerful approach to sinusoidal modeling of speech wherein a closed-loop Analysis-by-Synthesis (AbS) technique sequentially extracts the parameters for each si- nusoidal component. Low bit-rate speech coding is achieved by efficiently constraining the allowed frequencies of sinusoidal com- ponents into sets of frequency intervals or bins. In conjunction with the closed-loop analysis, the constrained frequency regions allow us to efficiently vector quantize the frequency information in each frame. In voiced frames, two sets of frequency vectors are generated: one for harmonically related components and the other for non-harmonically related components of the voiced segment. In transition frames, a vector of nonuniformly spaced frequencies is selected from a frequency codebook using frequency bin vector quantization (FBVQ) to represent the frequency domain informa- tion. The effectiveness of the coding scheme is enhanced by ex- ploiting the critical band concept of auditory perception in defining the frequency bins. In transition segments, the sinusoidal phases are modeled and coded. Subjective tests with a partially quantized model indicate that, for a target rate of 4 kbps, the coder quality exceeds that of the G.729 standard at 8 kbps. In this paper, we introduce an Analysis by Synthesis (AbS) si- nusoidal modeling technique for low bit-rate speech coding wherein the parameters for each sinusoidal component are sequentially ex- tracted by a closed-loop analysis. The sinusoidal modeling of the speech residual is performed within the general framework of matching pursuits (4, 5) with a dictionary of sinusoids. The fre- quency range is restricted to sets of frequency intervals or bins, which in conjunction with the closed-loop analysis allow us to map the frequencies of the sinusoids into a frequency vector that is effi- ciently quantized. In voiced frames, two sets of frequency vectors are generated: one of them represents harmonically related and the other one non-harmonically related components of the voiced seg- ment. This approach eliminates the need for voicing information that is difficult to estimate correctly and to quantize at low bit rates. In transition frames, a vector of nonuniformly spaced frequencies is selected from a frequency codebook using frequency bin vector quantization (FBVQ) to represent the frequency domain informa- tion. Our use of FBVQ with closed-loop searching combined with modeling and coding of the perceptually important phase infor- mation together contribute to a significant improvement of speech quality in transition frames. Subjective tests indicate that a par- tially quantized model with a target rate of 4 kbps has quality ex- ceeding the G.729 standard at 8 kbps.
Year
Venue
DocType
2000
ICASSP
Conference
Citations 
PageRank 
References 
2
0.47
4
Authors
2
Name
Order
Citations
PageRank
Vladimir Cuperman15611.32
Allen Gersho22031624.48