The Internet Encyclopedia (Volume 3)

P1: C-172

Kroon WL040/Bidgolio-Vol I WL040-Sample.cls June 20, 2003 13:9 Char Count= 0

SPEECHCODINGTECHNIQUES 313

ENCODER

DECODER

A(z) P(z) quantizer

Inversequantizer 1/P(z) 1/A(z)

input bits

bits output

Figure 9: Block diagram of a linear predictive coder with short-term and long-term prediction.

This is achieved by having a number of possible rep- resentative sample sets orcodebooks. Quantization is achieved by selecting the codebook entry (each entry con- taining multiple samples) that is the best representation of the signal to be quantized. The codebook index corresponding to this entry is transmitted and the decoder, which has a similar codebook, uses this index to look up the corresponding values. Whereas the lowest quantization rate for scalar quantization is 1 bit per sample, VQ allows fractional bit rates. For example, quantizing 2 samples simultaneously using a 1-bit codebook will result in 0.5 bits per sample. More typical values are a 10-bit codebook with codebook vectors of dimension 40, resulting in 0.25 bits/sample. Both scalar and vector quantization at- tempt to find the quantized value that is the closest to the original unquantized input. In the block diagram of Figure 9 one can argue that the overall goal is not to find the best quantized residual values, but the best quantized speech signal. Especially for coarse quantization (very few bits per sample), this becomes an issue. A powerful technique used in speech coding isanalysis-by-synthesis,in which the effect of quantization is determined by examin- ing the effect on the decoded output. This is accomplished by operating the diagram of Figure 9 in the configuration shown in Figure 10. In this figure the encoder of Figure 9 is enhanced with a local decoder. In most practical coders the predictors are still computed as usual. The quantization of the residual signal is done in an analysis-by-synthesis fashion. If we assume that we use a codebook, then the essence of this approach requires that for each codebook vector we perform local decoding and compare the resulting pro- totype output with the original input signal. The codebook vector that gives the best approximation is selected. Note that with this paradigm we are indirectly creating a quantized residual signal, and this signal is often referred

coder decoder +

error minimization

error weighting

coder decoder

error minimization

error weighting

Figure 10: Analysis-by-synthesis encoder with error weighting.

to as anexcitationsignal. Instead of directly comparing the original input signal and the quantized and decoded rendering of this signal, an error-weighting filter is intro- duced, which better reflects the way our auditory system perceives distortions. It should be noted that this weighting is much simpler than the complex auditory models used in audio coding. The block diagram of Figure 10 forms the basis for a family of coders generically referred to as code-excited linear predictive (CELP) coders. Most modern speech-coding standards are based on this princi- ple. A large amount of research has been done on efficient codebook structures, which not only give the best performance, but are also manageable in terms of search. A com- monly used structure is the so-calledalgebraiccodebook, which consists of a few nonzero pulses with deterministic positions. For all coders several techniques can be used to fur- ther improve their performance. Some of these techniques can be done independent of the coder, although in practice it makes sense to take advantage of the pa- rameters already computed by the coder. Preprocessing techniques that are useful are gain control and noise suppression. The latter can be quite sophisticated but very effective, especially for the lower rate speech coders, which typically do not handle background noise well. A widely used form of postprocessing ispostfiltering.In this process the decoded speech signal is slightly dis- torted in such a way that the coding noise gets suppressed and the signal gets enhanced. If done with care it can clean up a signal, resulting in perceived quality improve- ment. Another technique that has found some popularity is taking advantage of the fact that conversational speech comes in bursts, due to the speakers talking at alternate times. Sometimes there can be large pauses in between words. This can be taken advantage of by only transmit- ting when active speech is present. When speech is not active no signal is transmitted. Because on the average peo- ple speak half of the time, this technique has the potential to reduce the bit rate by half. To make thisdiscontinu- ous transmissionapproach work, avoice activity detector (VAD) is needed. For speech without background noise, this approach can work quite well. When background noise is present (e.g., a car), it is more difficult to get reli- able decisions from the VAD. Moreover, when no talker is active, and no signal is transmitted, the receiver side needs to substitute a replacement signal. This is referred to as comfort noise.For high levels of background noise it is difficult to have this comfort noise match its characteristics. Hence more sophisticated systems transmit low-rate information about the background noise, such as energy and spectral characteristics, at average rates of about 1 to 2 kb/s.

Speech Coding Standards For communication purposes it is important to es- tablish standards to guarantee interoperability between equipment from different vendors, or between telecommunication services in different geographic areas. Telecommunication standards are set by different stan- dard bodies, which typically govern different fields of use.

The Internet Encyclopedia (Volume 3)

Get our desktop app

Company

Features

Documentation

Resources