The Internet Encyclopedia (Volume 3)

(coco) #1

P1: C-172


Kroon WL040/Bidgolio-Vol I WL040-Sample.cls June 20, 2003 13:9 Char Count= 0


SPEECHCODINGTECHNIQUES 313

ENCODER

DECODER

A(z) P(z) quantizer

Inversequantizer 1/P(z) 1/A(z)

input bits

bits output

Figure 9: Block diagram of a linear predictive coder
with short-term and long-term prediction.

This is achieved by having a number of possible rep-
resentative sample sets orcodebooks. Quantization is
achieved by selecting the codebook entry (each entry con-
taining multiple samples) that is the best representation
of the signal to be quantized. The codebook index cor-
responding to this entry is transmitted and the decoder,
which has a similar codebook, uses this index to look up
the corresponding values. Whereas the lowest quantiza-
tion rate for scalar quantization is 1 bit per sample, VQ
allows fractional bit rates. For example, quantizing 2 sam-
ples simultaneously using a 1-bit codebook will result in
0.5 bits per sample. More typical values are a 10-bit code-
book with codebook vectors of dimension 40, resulting in
0.25 bits/sample. Both scalar and vector quantization at-
tempt to find the quantized value that is the closest to the
original unquantized input. In the block diagram of
Figure 9 one can argue that the overall goal is not to find
the best quantized residual values, but the best quantized
speech signal. Especially for coarse quantization (very few
bits per sample), this becomes an issue. A powerful tech-
nique used in speech coding isanalysis-by-synthesis,in
which the effect of quantization is determined by examin-
ing the effect on the decoded output. This is accomplished
by operating the diagram of Figure 9 in the configuration
shown in Figure 10.
In this figure the encoder of Figure 9 is enhanced with
a local decoder. In most practical coders the predictors
are still computed as usual. The quantization of the resid-
ual signal is done in an analysis-by-synthesis fashion. If
we assume that we use a codebook, then the essence of
this approach requires that for each codebook vector we
perform local decoding and compare the resulting pro-
totype output with the original input signal. The code-
book vector that gives the best approximation is selected.
Note that with this paradigm we are indirectly creating a
quantized residual signal, and this signal is often referred

coder decoder +

error
minimization

error
weighting


  • coder decoder


error
minimization

error
weighting





Figure 10: Analysis-by-synthesis encoder with
error weighting.

to as anexcitationsignal. Instead of directly comparing
the original input signal and the quantized and decoded
rendering of this signal, an error-weighting filter is intro-
duced, which better reflects the way our auditory system
perceives distortions. It should be noted that this weight-
ing is much simpler than the complex auditory models
used in audio coding. The block diagram of Figure 10
forms the basis for a family of coders generically referred
to as code-excited linear predictive (CELP) coders. Most
modern speech-coding standards are based on this princi-
ple. A large amount of research has been done on efficient
codebook structures, which not only give the best perfor-
mance, but are also manageable in terms of search. A com-
monly used structure is the so-calledalgebraiccodebook,
which consists of a few nonzero pulses with deterministic
positions.
For all coders several techniques can be used to fur-
ther improve their performance. Some of these tech-
niques can be done independent of the coder, although
in practice it makes sense to take advantage of the pa-
rameters already computed by the coder. Preprocessing
techniques that are useful are gain control and noise
suppression. The latter can be quite sophisticated but
very effective, especially for the lower rate speech coders,
which typically do not handle background noise well.
A widely used form of postprocessing ispostfiltering.In
this process the decoded speech signal is slightly dis-
torted in such a way that the coding noise gets suppressed
and the signal gets enhanced. If done with care it can
clean up a signal, resulting in perceived quality improve-
ment.
Another technique that has found some popularity is
taking advantage of the fact that conversational speech
comes in bursts, due to the speakers talking at alternate
times. Sometimes there can be large pauses in between
words. This can be taken advantage of by only transmit-
ting when active speech is present. When speech is not ac-
tive no signal is transmitted. Because on the average peo-
ple speak half of the time, this technique has the potential
to reduce the bit rate by half. To make thisdiscontinu-
ous transmissionapproach work, avoice activity detector
(VAD) is needed. For speech without background noise,
this approach can work quite well. When background
noise is present (e.g., a car), it is more difficult to get reli-
able decisions from the VAD. Moreover, when no talker is
active, and no signal is transmitted, the receiver side needs
to substitute a replacement signal. This is referred to as
comfort noise.For high levels of background noise it is
difficult to have this comfort noise match its characteris-
tics. Hence more sophisticated systems transmit low-rate
information about the background noise, such as energy
and spectral characteristics, at average rates of about 1 to
2 kb/s.

Speech Coding Standards
For communication purposes it is important to es-
tablish standards to guarantee interoperability between
equipment from different vendors, or between telecom-
munication services in different geographic areas.
Telecommunication standards are set by different stan-
dard bodies, which typically govern different fields of use.
Free download pdf